Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问chatglm得generate方法是否支持embedding输入? #18

Closed
bingwork opened this issue Oct 27, 2023 · 5 comments
Closed

请问chatglm得generate方法是否支持embedding输入? #18

bingwork opened this issue Oct 27, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@bingwork
Copy link

bingwork commented Oct 27, 2023

微信截图_20231027210229
微信截图_20231027210153

我没看到具体generate方法代码,就先用prepare_inputs_for_generation分析。
如上图,llama的prepare_inputs_for_generation可以支持embedding输入,但是chatglm没有。
请问chatglm的generate方法是否不支持embedding输入?
如果理解错误,还望见谅。
@xunkai55 @davidlvxin @duzx16

@LittleGreenYuan
Copy link

请问,你有任何新的想法吗?我在源文件中找到了‘PrefixEncoder’的类,似乎被用在了P-TuningV2里
定义在:

65: class PrefixEncoder(torch.nn.Module)

使用在:

736: class ChatGLMModel(ChatGLMPreTrainedModel):
789:      def forward():

在这一过程中,官方是将这里的emmbedding作为past_key_values: Optional[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]] = None
送入模型的,不知道能不能帮到你,我也恰好正在研究这里

@zRzRzRzRzRzRzR
Copy link
Member

不确定,将跟算法同学进行讨论

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR added the enhancement New feature or request label Nov 21, 2023
@Junjie-Chu
Copy link

I'm trying to use GCG with ChatGLM3.

After I read the code carefully, I think generate() actually supports inputs_embeds, which may solve the issue.
I found that input_ids is only used to provide size to create attention_mask and position_ids. When inputs_embeds is passed in, according to the code

if inputs_embeds is None:
    inputs_embeds = self.embedding(input_ids)

the parameter input_ids does not actually affect the inference results?

So in fact, to use inputs_embeds as input, we only needmodel(input_ids, inputs_embeds)

Not sure if my understanding is correct?

And I find, when run model(input_ids=input_ids.unsqueeze(0),inputs_embeds=full_embeds), the output dimensions of ChatGLM3 seem to be different with those of Llama2 or Vicuna? Need to use something like .permute(1, 0, 2)?

Not sure about my understanding, thanks a lot in advance for your support!

@LittleGreenYuan
Copy link

Following the code below does pass embedding as an input, but when using model.generate(), it will prompt an error:"You passed inputs_embeds to .generate(), but the model class ChatGLMForConditionalGeneration doesn't have its forwarding implemented. See the GPT2 implementation for an example (huggingface/transformers#21405), and feel free to open a PR with it!"

inputs  = tokenizer(MutilTalk_Prompt,padding = 'max_length',max_length = 99)
tensor_input_ids = torch.tensor(inputs['input_ids']+[2])
tensor_input_ids = tensor_input_ids.cuda()
print(tensor_input_ids)
input_embeds = model.transformer.embedding(tensor_input_ids.unsqueeze(0))

outputs = model(input_ids=tensor_input_ids.unsqueeze(0),inputs_embeds=input_embeds)
logits_output = tokenizer.batch_decode(torch.argmax(outputs['logits'], -1).detach().cpu().numpy(), skip_special_tokens=True)
print(logits_output)

#error
outputs = model.generate(input_ids=tensor_input_ids.unsqueeze(0),inputs_embeds=input_embeds)
logits_output = tokenizer.batch_decode(torch.argmax(outputs['logits'], -1).detach().cpu().numpy(), skip_special_tokens=True)
print(logits_output)

@Junjie-Chu
Copy link

Junjie-Chu commented Nov 24, 2023

Following the code below does pass embedding as an input, but when using model.generate(), it will prompt an error:"You passed inputs_embeds to .generate(), but the model class ChatGLMForConditionalGeneration doesn't have its forwarding implemented. See the GPT2 implementation for an example (huggingface/transformers#21405), and feel free to open a PR with it!"

inputs  = tokenizer(MutilTalk_Prompt,padding = 'max_length',max_length = 99)
tensor_input_ids = torch.tensor(inputs['input_ids']+[2])
tensor_input_ids = tensor_input_ids.cuda()
print(tensor_input_ids)
input_embeds = model.transformer.embedding(tensor_input_ids.unsqueeze(0))

outputs = model(input_ids=tensor_input_ids.unsqueeze(0),inputs_embeds=input_embeds)
logits_output = tokenizer.batch_decode(torch.argmax(outputs['logits'], -1).detach().cpu().numpy(), skip_special_tokens=True)
print(logits_output)

#error
outputs = model.generate(input_ids=tensor_input_ids.unsqueeze(0),inputs_embeds=input_embeds)
logits_output = tokenizer.batch_decode(torch.argmax(outputs['logits'], -1).detach().cpu().numpy(), skip_special_tokens=True)
print(logits_output)

Oh, I get what u mean now, actually I do not use generate(), instead I just use model().logits. And in this case it runs well. But output has a different dimension with that of llama2 or vicuna XD

@THUDM THUDM locked and limited conversation to collaborators Nov 24, 2023
@zRzRzRzRzRzRzR zRzRzRzRzRzRzR converted this issue into discussion #436 Nov 24, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants