按照官网的方式搞的，但是答案无法输出。 #143

listwebit · 2024-06-28T10:23:27Z

listwebit · 2024-06-28T10:25:30Z

transformers 版本 4.40.2
无论升级和降级都是这种情况

MikeDean2367 · 2024-06-29T13:36:35Z

您好，我最近写一个单个文件的gradio给您，预计明后天

MikeDean2367 · 2024-07-01T03:31:35Z

您好，下面是取消流式输出的代码：

import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# @torch.no_grad()
def generate_response(instruction, text="", temperature=1.0, top_p=0.9, top_k=50, num_beams=1, max_new_tokens=50, repetition_penalty=1.0):
    with torch.no_grad():
        if text != "":
            input_text = f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{text}\n\n### Response:\n"
        else:
            input_text = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n"

        input_ids = tokenizer.encode(input_text, return_tensors='pt').to('cuda')

        output_ids = model.generate(
            input_ids,
            max_length=input_ids.shape[1] + max_new_tokens,
            temperature=temperature,
            top_k=top_k,
            top_p=top_p,
            num_beams=num_beams,
            repetition_penalty=repetition_penalty,
        )
        
        output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
        return output_text[len(input_text):]

if __name__ == '__main__':
    model_name = "zjunlp/knowlm-13b-zhixi"  
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype=torch.bfloat16, device_map="auto",
        load_in_8bit=True
    )

    interface = gr.Interface(
        fn=generate_response,
        inputs=[
            gr.Textbox(label="Instruction", placeholder="Enter instruction here...", lines=2, value="""从给定的文本中提取出可能的实体和实体类型，可选的实体类型为['地点', '人名']，以（实体，实体类型）的格式回答。"""),
            gr.Textbox(label="Optional Text", placeholder="Enter optional text here...", lines=2, optional=True, value="""John昨天在纽约的咖啡馆见到了他的朋友Merry。他们一起喝咖啡聊天，计划着下周去加利福尼亚（California）旅行。他们决定一起租车并预订酒店。他们先计划在下周一去圣弗朗西斯科参观旧金山大桥，下周三去洛杉矶拜访Merry的父亲威廉。"""),
            gr.Slider(label="Temperature", minimum=0.1, maximum=2.0, value=1.0, step=0.1),
            gr.Slider(label="Top p", minimum=0.0, maximum=1.0, value=0.9, step=0.01),
            gr.Slider(label="Top k", minimum=0, maximum=100, value=50, step=1),
            gr.Slider(label="Number of Beams", minimum=1, maximum=10, value=1),
            gr.Slider(label="Max New Tokens", minimum=1, maximum=512, value=50),
            gr.Slider(label="Repetition Penalty", minimum=0.1, maximum=1.6, value=1.0, step=0.1)
        ],
        outputs="text",
        title="Zhixi",
        description="<center>https://github.com/zjunlp/knowlm</center>"
    )

    interface.launch()

listwebit · 2024-07-02T03:20:54Z

大佬还是不行，也不知道是环境问题还是啥问题：
如果用OneKE模型，就报错如下：

(factory) aeye@aeye-176:/raid/liulei/KnowLM/examples$ python test.py
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:05<00:00,  1.98s/it]
Traceback (most recent call last):
  File "/raid/liulei/KnowLM/examples/test.py", line 40, in <module>
    gr.Textbox(label="Optional Text", placeholder="Enter optional text here...", lines=2, optional=True, value="""John昨天在纽约的咖啡馆见到了他的朋友Merry。他们一起喝咖啡聊天，计划着下周去加利福尼亚（California）旅行。他们决定一起租车并预订酒店。他们先计划在下周一去圣弗朗西斯科参观旧金山大桥，下周三去洛杉矶拜访Merry的父亲威廉。"""),
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/gradio/component_meta.py", line 163, in wrapper
    return fn(self, **kwargs)
TypeError: Textbox.__init__() got an unexpected keyword argument 'optional'

如果用knowlm-13b-zhixi 模型
就报错：

(factory) aeye@aeye-176:/raid/liulei/KnowLM/examples$ python test.py
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Traceback (most recent call last):
  File "/raid/liulei/KnowLM/examples/test.py", line 31, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_name)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 889, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2163, in from_pretrained
    return cls._from_pretrained(
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2397, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 173, in __init__
    self.update_post_processor()
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 186, in update_post_processor
    bos_token_id = self.bos_token_id
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1187, in bos_token_id
    return self.convert_tokens_to_ids(self.bos_token)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 349, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 356, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1206, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 349, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 356, in _convert_token_to_id_with_added_voc

但是如果用generate_lora.py ,虽然有些浸膏，但是就能正常推理出来。


/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:540: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.4` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:545: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.75` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:562: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `40` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
GenerationConfig {
  "num_beams": 2,
  "repetition_penalty": 1.3,
  "temperature": 0.4,
  "top_k": 40,
  "top_p": 0.75
}

/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:540: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.4` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:545: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.75` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
/home/aeye/anaconda3/envs/factory/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:562: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `40` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
  warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
{"症状": ["咳嗽", "咳痰"], "总病程": "15年", "每次发病时长": "4-5个月", "本次发病时长": "4-5个月"}

MikeDean2367 · 2024-07-02T03:32:21Z

如果使用oneke，请执行下面的命令：

CUDA_VISIBLE_DEVICES=0 python examples/generate_lora_web.py --base_model zjunlp/oneke --model_tag oneke

如果使用zhixi，请执行下面的命令：

CUDA_VISIBLE_DEVICES=0 python examples/generate_lora_web.py --base_model zjunlp/knowlm-13b-zhixi --model_tag zhixi

zxlzr · 2024-07-02T11:42:20Z

请问您的问题解决了吗

zxlzr closed this as completed Jul 4, 2024

zxlzr added the question Further information is requested label Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

按照官网的方式搞的，但是答案无法输出。 #143

按照官网的方式搞的，但是答案无法输出。 #143

listwebit commented Jun 28, 2024

listwebit commented Jun 28, 2024

MikeDean2367 commented Jun 29, 2024

MikeDean2367 commented Jul 1, 2024

listwebit commented Jul 2, 2024

MikeDean2367 commented Jul 2, 2024

zxlzr commented Jul 2, 2024

按照官网的方式搞的，但是答案无法输出。 #143

按照官网的方式搞的，但是答案无法输出。 #143

Comments

listwebit commented Jun 28, 2024

listwebit commented Jun 28, 2024

MikeDean2367 commented Jun 29, 2024

MikeDean2367 commented Jul 1, 2024

listwebit commented Jul 2, 2024

MikeDean2367 commented Jul 2, 2024

zxlzr commented Jul 2, 2024