-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repeated token is generated comparing to beam-search, when using fast_contrastive_search on T5 #5
Comments
Hi @wuzhiye7, Can you send the name of Chinese BART huggingface model and your inputs to me? I would like to test the instance and provide you some feedbacks. |
Hi @wuzhiye7, I have tested the case on my end. Please follow the instructions below: (1) First, install simctg from pip: pip install simctg --upgrade (2) Second, run the example below: from simctg.simctgt5 import SimCTGT5
model_name = r'imxly/t5-pegasus'
# initialize tokenizer
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained(model_name)
# initialize model
from transformers.models.mt5.modeling_mt5 import MT5ForConditionalGeneration
t5model = MT5ForConditionalGeneration.from_pretrained(model_name)
model = SimCTGT5(model_name, user_defined_model=t5model, user_defined_tokenizer=tokenizer, special_token_list=[])
print ('------------------------------------------')
# prepare input
text = '我不会贴假睫毛呀,好难!'
ids = tokenizer.encode(text, return_tensors='pt')
print ('The input text is: {}'.format(text))
print ('------------------------------------------')
# generate result
output = model.fast_contrastive_search(input_ids=ids, beam_width=5, alpha=0.5, decoding_len=30,
start_of_sequence_token_id=tokenizer.cls_token_id,
end_of_sequence_token_id=tokenizer.sep_token_id, early_stop = True)
output_text = ''.join(tokenizer.convert_ids_to_tokens(output))
print ('The output text is: {}'.format(output_text))
'''
------------------------------------------
The input text is: 我不会贴假睫毛呀,好难!
------------------------------------------
The output text is: 如何贴假睫毛?我是女生
''' P.S. If you are interested, the source code of simctg package is located here (https://github.com/yxuansu/SimCTG/tree/main/simctg). Please let me know if you have any questions. |
thanks ,its ok now |
I used the fast_contrastive_search, cpoied from https://github.com/yxuansu/SimCTG/blob/main/SimCTGEncDec/SimCTGT5/simctgt5.py
code ,as follows:
but generated reapied tokens:
The text was updated successfully, but these errors were encountered: