Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

按照官网的方式搞的,但是答案无法输出。 #143

Closed
listwebit opened this issue Jun 28, 2024 · 6 comments
Closed

按照官网的方式搞的,但是答案无法输出。 #143

listwebit opened this issue Jun 28, 2024 · 6 comments
Labels
question Further information is requested

Comments

@listwebit
Copy link

Dingtalk_20240628182248

@listwebit
Copy link
Author

transformers 版本 4.40.2
无论升级和降级都是这种情况

@MikeDean2367
Copy link
Collaborator

您好,我最近写一个单个文件的gradio给您,预计明后天

@MikeDean2367
Copy link
Collaborator

您好,下面是取消流式输出的代码:

import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# @torch.no_grad()
def generate_response(instruction, text="", temperature=1.0, top_p=0.9, top_k=50, num_beams=1, max_new_tokens=50, repetition_penalty=1.0):
    with torch.no_grad():
        if text != "":
            input_text = f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{text}\n\n### Response:\n"
        else:
            input_text = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n"

        input_ids = tokenizer.encode(input_text, return_tensors='pt').to('cuda')

        output_ids = model.generate(
            input_ids,
            max_length=input_ids.shape[1] + max_new_tokens,
            temperature=temperature,
            top_k=top_k,
            top_p=top_p,
            num_beams=num_beams,
            repetition_penalty=repetition_penalty,
        )
        
        output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
        return output_text[len(input_text):]

if __name__ == '__main__':
    model_name = "zjunlp/knowlm-13b-zhixi"  
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype=torch.bfloat16, device_map="auto",
        load_in_8bit=True
    )

    interface = gr.Interface(
        fn=generate_response,
        inputs=[
            gr.Textbox(label="Instruction", placeholder="Enter instruction here...", lines=2, value="""从给定的文本中提取出可能的实体和实体类型,可选的实体类型为['地点', '人名'],以(实体,实体类型)的格式回答。"""),
            gr.Textbox(label="Optional Text", placeholder="Enter optional text here...", lines=2, optional=True, value="""John昨天在纽约的咖啡馆见到了他的朋友Merry。他们一起喝咖啡聊天,计划着下周去加利福尼亚(California)旅行。他们决定一起租车并预订酒店。他们先计划在下周一去圣弗朗西斯科参观旧金山大桥,下周三去洛杉矶拜访Merry的父亲威廉。"""),
            gr.Slider(label="Temperature", minimum=0.1, maximum=2.0, value=1.0, step=0.1),
            gr.Slider(label="Top p", minimum=0.0, maximum=1.0, value=0.9, step=0.01),
            gr.Slider(label="Top k", minimum=0, maximum=100, value=50, step=1),
            gr.Slider(label="Number of Beams", minimum=1, maximum=10, value=1),
            gr.Slider(label="Max New Tokens", minimum=1, maximum=512, value=50),
            gr.Slider(label="Repetition Penalty", minimum=0.1, maximum=1.6, value=1.0, step=0.1)
        ],
        outputs="text",
        title="Zhixi",
        description="<center>https://github.com/zjunlp/knowlm</center>"
    )

    interface.launch()

@listwebit
Copy link
Author

大佬还是不行,也不知道是环境问题还是啥问题:
如果用OneKE模型,就报错如下:

(factory) aeye@aeye-176:/raid/liulei/KnowLM/examples$ python test.py
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:05<00:00,  1.98s/it]
Traceback (most recent call last):
  File "/raid/liulei/KnowLM/examples/test.py", line 40, in <module>
    gr.Textbox(label="Optional Text", placeholder="Enter optional text here...", lines=2, optional=True, value="""John昨天在纽约的咖啡馆见到了他的朋友Merry。他们一起喝咖啡聊天,计划着下周去加利福尼亚(California)旅行。他们决定一起租车并预订酒店。他们先计划在下周一去圣弗朗西斯科参观旧金山大桥,下周三去洛杉矶拜访Merry的父亲威廉。"""),
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/gradio/component_meta.py", line 163, in wrapper
    return fn(self, **kwargs)
TypeError: Textbox.__init__() got an unexpected keyword argument 'optional'


如果用knowlm-13b-zhixi 模型
就报错:

(factory) aeye@aeye-176:/raid/liulei/KnowLM/examples$ python test.py
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Traceback (most recent call last):
  File "/raid/liulei/KnowLM/examples/test.py", line 31, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_name)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 889, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2163, in from_pretrained
    return cls._from_pretrained(
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2397, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 173, in __init__
    self.update_post_processor()
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 186, in update_post_processor
    bos_token_id = self.bos_token_id
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1187, in bos_token_id
    return self.convert_tokens_to_ids(self.bos_token)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 349, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 356, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1206, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 349, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 356, in _convert_token_to_id_with_added_voc


但是如果用generate_lora.py ,虽然有些浸膏,但是 就能正常推理出来。


/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:540: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.4` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:545: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.75` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:562: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `40` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
GenerationConfig {
  "num_beams": 2,
  "repetition_penalty": 1.3,
  "temperature": 0.4,
  "top_k": 40,
  "top_p": 0.75
}

/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:540: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.4` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:545: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.75` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:562: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `40` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
  warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
{"症状": ["咳嗽", "咳痰"], "总病程": "15年", "每次发病时长": "4-5个月", "本次发病时长": "4-5个月"}

@MikeDean2367
Copy link
Collaborator

如果使用oneke,请执行下面的命令:

CUDA_VISIBLE_DEVICES=0 python examples/generate_lora_web.py --base_model zjunlp/oneke --model_tag oneke

如果使用zhixi,请执行下面的命令:

CUDA_VISIBLE_DEVICES=0 python examples/generate_lora_web.py --base_model zjunlp/knowlm-13b-zhixi --model_tag zhixi

@zxlzr
Copy link
Contributor

zxlzr commented Jul 2, 2024

请问您的问题解决了吗

@zxlzr zxlzr closed this as completed Jul 4, 2024
@zxlzr zxlzr added the question Further information is requested label Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants