在执行“python examples/generate_lora_web.py --base_model knowlm-13b-zhixi”进行基于web的交互效果测试时，输入instruction和input，点击submit之后，output总是出现Error的报错，并且python的运行终端直接结束运行。运行截图如下图： #142

lh5533223 · 2024-06-26T03:22:03Z

MikeDean2367 · 2024-06-26T05:19:11Z

您好，请确保使用仓库的最新代码，如果仍报错，请提供所使用的gradio版本，我使用的gradio版本是3.50.2。

lh5533223 · 2024-06-26T06:45:49Z

您好，请确保使用仓库的最新代码，如果仍报错，请提供所使用的gradio版本，我使用的gradio版本是3.50.2。
确实使用的是仓库的最新代码，并且gradio版本已经改为了3.50.2

仓库最新代码generate_lora_web.py的这个红框的位置的“en”是不是打错了，我把它改为“len”才不会报错

就一致是上面那个页面“Errow”问题，运行“python examples/generate_finetune_web.py --base_model zjunlp/knowlm-13b-base-v1.0”这段代码是完全没有问题的！

lh5533223 · 2024-06-26T07:43:48Z

您好，请确保使用仓库的最新代码，如果仍报错，请提供所使用的gradio版本，我使用的gradio版本是3.50.2。

可以帮我一下吗？

MikeDean2367 · 2024-06-26T07:49:01Z

您好，我本地多次测试仍然没有任何问题。根据您的描述来看，控制台没有任何报错直接结束，由于缺乏报错信息，因此很难帮您排出问题。下面是一些可能的建议：

确保显存充足。您可以在加载的时候查看一下显存占用情况，以及当您单击“提交”的时候GPU的利用率和显存的变化。
确保版本之间没有冲突。下面是我所使用的环境：transformers==4.41.2 gradio==3.50.2 torch==2.1.1。
确保模型权重下载完整。您可以执行examples/generate_lora.py的代码。

lh5533223 · 2024-06-26T11:46:17Z

您好，我本地多次测试仍然没有任何问题。根据您的描述来看，控制台没有任何报错直接结束，由于缺乏报错信息，因此很难帮您排出问题。下面是一些可能的建议：

确保显存充足。您可以在加载的时候查看一下显存占用情况，以及当您单击“提交”的时候GPU的利用率和显存的变化。

确保版本之间没有冲突。下面是我所使用的环境：transformers==4.41.2 gradio==3.50.2 torch==2.1.1。

确保模型权重下载完整。您可以执行examples/generate_lora.py的代码。

你好，我已经执行了examples/generate_lora.py的代码，是没有问题的。只是过程稍微慢一些。但examples/generate_lora_web.py的代码，点击“提交”按钮之后仍然跳出这个“Error”错误，并且出现

这个“Connection errored out.”的提示。
我使用的环境及版本是这样的：transformers==4.28.1 gradio==3.50.2 torch==1.13.1+cuda11.6。

MikeDean2367 · 2024-06-26T11:52:05Z

您好，根据您的反馈，我找到了相关的issue，可能是网络的问题，您可以参考其中的建议 :)

lh5533223 · 2024-06-26T13:18:55Z

您好，根据您的反馈，我找到了相关的issue，可能是网络的问题，您可以参考其中的建议 :)

您好，根据您提供的相关AUTOMATIC1111/stable-diffusion-webui#9074
他们的解决办法是利用“stable-diffusion-webui“中的”export COMMANDLINE_ARGS="--no-gradio-queue"”执行实现的。

而我们的项目中并没有用到webui.sh文件，我觉着这不说一类问题。
麻烦请问一下可以再给我提供一下“example/generate_lora_web.py”此文件的代码吗？

lh5533223 · 2024-06-27T02:02:09Z

你好，可以帮我一下吗？
我torch和transforms版本是根据readme中的版本来的。和你本机上测试的torch=2.1.1有所不同。请问我还需要按照您的版本安装一下吗？

MikeDean2367 · 2024-06-27T02:49:08Z

您好，下面是我的代码，和仓库中的代码相同：

import os
import sys

import fire
import gradio as gr
import torch
import transformers
from peft import PeftModel
from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer
from multi_gpu_inference import get_tokenizer_and_model
from typing import List

from callbacks import Iteratorize, Stream
from prompter import Prompter
from utils import Web

if torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"

try:
    if torch.backends.mps.is_available():
        device = "mps"
except:  # noqa: E722
    pass

# model = None
# tokenizer = None
# web_config = None

def get_kwargs(kwargs, *args):
    return [kwargs.get(arg, None) for arg in args]


def main(
    load_8bit: bool = True,
    load_4bit: bool = False,
    base_model: str = None,
    model_tag: str = None,
    # lora_weights: str = "zjunlp/CaMA-13B-LoRA",
    # prompt_template: str = "finetune/lora/knowlm/templates/alpaca.json",  # The prompt template to use, will default to alpaca.
    server_name: str = "0.0.0.0",  # Allows to listen on all interfaces by providing '0.
    share_gradio: bool = False,
    multi_gpu: bool = False,
    allocate: List[int] = None
):
    # global model
    # global tokenizer
    # global web_config
    base_model = base_model or os.environ.get("BASE_MODEL", "")
    assert (
        base_model
    ), "Please specify a --base_model, e.g. --base_model='huggyllama/llama-7b'"

    tokenizer = AutoTokenizer.from_pretrained(base_model)
    if device == "cuda":
        if multi_gpu:
            model, tokenizer = get_tokenizer_and_model(base_model=base_model, dtype="float16", allocate=allocate)
        else:
            if load_8bit or load_4bit:
                device_map = {"":0}
            else:
                device_map = {"": device}
            if load_4bit:
                load_8bit = False
            model = AutoModelForCausalLM.from_pretrained(
                base_model,
                load_in_4bit=load_8bit,
                load_in_8bit=load_4bit,
                torch_dtype=torch.float16,
                device_map=device_map,
            )
        # model = PeftModel.from_pretrained(
        #     model,
        #     lora_weights,
        #     torch_dtype=torch.float16,
        # )
    elif device == "mps":
        model = AutoModelForCausalLM.from_pretrained(
            base_model,
            device_map={"": device},
            torch_dtype=torch.float16,
        )
        # model = PeftModel.from_pretrained(
        #     model,
        #     lora_weights,
        #     device_map={"": device},
        #     torch_dtype=torch.float16,
        # )
    elif device == 'cpu':
        model = AutoModelForCausalLM.from_pretrained(
            base_model,
            torch_dtype=torch.float32,
        )
    else:
        model = AutoModelForCausalLM.from_pretrained(
            base_model, device_map={"": device}, low_cpu_mem_usage=True
        )
        # model = PeftModel.from_pretrained(
        #     model,
        #     lora_weights,
        #     device_map={"": device},
        # )

    if model_tag == None:
        if 'oneke' in base_model.lower():
            model_tag = 'oneke'
        elif 'knowlm' in base_model.lower():
            model_tag = 'zhixi'
        else:
            assert False, "Please specify the `model_tag`!"
    assert model_tag in  Web.__SUPPORT_MODEL__
    web_config = Web.get_ui(model_tag)
    prompter = Prompter(model_name=model_tag)
    # unwind broken decapoda-research config
    # model.config.pad_token_id = tokenizer.pad_token_id = 0  # pad
    # model.config.bos_token_id = tokenizer.pad_token_id = 1
    # model.config.eos_token_id = tokenizer.pad_token_id = 2

    if not load_8bit and device != "cpu":
        model.half()  # seems to fix bugs for some users.

    model.eval()
    if torch.__version__ >= "2" and sys.platform != "win32":
        model = torch.compile(model)


    def evaluate(
        # instruction,
        # input=None,
        # temperature=0.4,
        # top_p=0.75,
        # top_k=40,
        # num_beams=2,
        # max_new_tokens=512,
        # repetition_penalty=1.3,
        # stream_output=False,
        *args
        # **kwargs,
    ):
        kwargs = {}
        assert len(web_config['var_name']) == len(args), f"{en(web_config['var_name'])} == {len(args)}"
        for key, value in zip(web_config['var_name'], args):
            kwargs[key] = value
        print(kwargs)
        generation_var = [
            'temperature', 
            'top_p', 
            'top_k', 
            'num_beams', 
            'max_new_tokens', 
            'repetition_penalty',
            'stream_output'
        ]
        input_var = [
            'instruction', 
            'input', 
            'schema',
            'system_prompt',
        ]

        all_var = input_var + generation_var
        instruction, input, schema, system_prompt, \
            temperature, top_p, top_k, num_beams, max_new_tokens, repetition_penalty, stream_output = get_kwargs(kwargs, *all_var)

        generation_var_args = {}
        input_var_args = {}
        for cur_var in [(generation_var, generation_var_args), (input_var, input_var_args)]:
            _cur_var_list, _cur_arg = cur_var
            for var_name in _cur_var_list:
                if var_name in kwargs:
                    _cur_arg[var_name] = kwargs.pop(var_name)
        
        prompt = prompter.generate_prompt(**input_var_args)
        inputs = tokenizer(prompt, return_tensors="pt")
        input_ids = inputs["input_ids"].to(device)
        generation_config = GenerationConfig(
            **generation_var_args
            # temperature=temperature,
            # top_p=top_p,
            # top_k=top_k,
            # num_beams=num_beams,
            # repetition_penalty=repetition_penalty,
            # **kwargs,
        )

        generate_params = {
            "input_ids": input_ids,
            "generation_config": generation_config,
            "return_dict_in_generate": True,
            "output_scores": True,
            "max_new_tokens": max_new_tokens,
        }

        if stream_output:
            # Stream the reply 1 token at a time.
            # This is based on the trick of using 'stopping_criteria' to create an iterator,
            # from https://github.com/oobabooga/text-generation-webui/blob/ad37f396fc8bcbab90e11ecf17c56c97bfbd4a9c/modules/text_generation.py#L216-L243.

            def generate_with_callback(callback=None, **kwargs):
                kwargs.setdefault(
                    "stopping_criteria", transformers.StoppingCriteriaList()
                )
                kwargs["stopping_criteria"].append(
                    Stream(callback_func=callback)
                )
                with torch.no_grad():
                    model.generate(**kwargs)

            def generate_with_streaming(**kwargs):
                return Iteratorize(
                    generate_with_callback, kwargs, callback=None
                )

            with generate_with_streaming(**generate_params) as generator:
                for output in generator:
                    # new_tokens = len(output) - len(input_ids[0])
                    decoded_output = tokenizer.decode(output[1:])   # # `1:` means that we need to filter bos token

                    if output[-1] in [tokenizer.eos_token_id]:
                        break

                    yield prompter.get_response(decoded_output)
            return  # early return for stream_output

        # Without streaming
        with torch.no_grad():
            generation_output = model.generate(
                input_ids=input_ids,
                generation_config=generation_config,
                return_dict_in_generate=True,
                output_scores=True,
                max_new_tokens=max_new_tokens,
                eos_token_id=tokenizer.eos_token_id
            )
        s = generation_output.sequences[0]  
        output = tokenizer.decode(s[1:])    # `1:` means that we need to filter bos token
        yield prompter.get_response(output)



    gr.Interface(
        fn=evaluate,
        inputs=web_config['components'],
        # inputs=[
        #     gr.components.Textbox(
        #         lines=2,
        #         label="Instruction",
        #         placeholder="<请在此输入你的问题>",
        #     ),
        #     gr.components.Textbox(
        #         lines=2, 
        #         label="Input", 
        #         placeholder="<可选参数>",
        #     ),
        #     gr.components.Slider(
        #         minimum=0, maximum=1, value=0.4, label="Temperature"
        #     ),
        #     gr.components.Slider(
        #         minimum=0, maximum=1, value=0.75, label="Top p"
        #     ),
        #     gr.components.Slider(
        #         minimum=0, maximum=100, step=1, value=40, label="Top k"
        #     ),
        #     gr.components.Slider(
        #         minimum=1, maximum=4, step=1, value=2, label="Beams"
        #     ),
        #     gr.components.Slider(
        #         minimum=1, maximum=2000, step=1, value=512, label="Max tokens"
        #     ),
        #     gr.components.Slider(
        #         minimum=1, maximum=2, step=0.1, value=1.3, label="Repetition Penalty"
        #     ),
        #     gr.components.Checkbox(label="Stream output"),
        # ],
        outputs=[
            gr.Textbox(
                lines=5,
                label="Output",
            )
        ],
        title=web_config['title'],
        description=web_config['description'],
    ).queue().launch(server_name="0.0.0.0", share=share_gradio)


if __name__ == "__main__":
    fire.Fire(main)
    """
    # multi-gpu
    CUDA_VISIBLE_DEVICES=0,1,2,3 python examples/generate_lora_web.py --base_model zjunlp/knowlm-13b-zhixi --multi_gpu --allocate [5,10,8,10]
    # single-gpu
    CUDA_VISIBLE_DEVICES=0,1,2,3 python examples/generate_lora_web.py --base_model zjunlp/knowlm-13b-zhixi

    # testing zhixi
    CUDA_VISIBLE_DEVICES=0,1 python examples/generate_lora_web.py --base_model zjunlp/oneke --multi_gpu --allocate [16,16]
    
    # testing oneke
    CUDA_VISIBLE_DEVICES=0,1 python examples/generate_lora_web.py --base_model zjunlp/zhixi --multi_gpu --allocate [16,16]
    """

关于第二个问题，应该和torch无关，因为您执行examples/generate_lora.py没有异常，所以无需更换torch版本

MikeDean2367 · 2024-06-27T02:52:37Z

您可以尝试将285行的.queue()移除

lh5533223 · 2024-06-27T06:42:28Z

您好，我移除.queue()方法后，报以上错误

MikeDean2367 · 2024-06-27T08:12:09Z

您好，此处是一个极简代码，您看看是否可以正常执行：

import gradio as gr
import time

def generate(inputs: str):
    return inputs + f" {time.ctime()}"

def main():
    gr.Interface(
        fn=generate,
        inputs=[
            gr.components.Textbox(
                lines=2,
                label="Instruction",
                placeholder="<请在此输入你的问题>",
            ),
        ],
        outputs=[
            gr.Textbox(
                lines=5,
                label="Output",
            )
        ]
    ).queue().launch()

if __name__ == '__main__':
    main()

zxlzr · 2024-06-27T11:05:41Z

请问您的问题是否已解决？

lh5533223 · 2024-06-27T13:07:00Z

您好，此处是一个极简代码，您看看是否可以正常执行：

import gradio as gr
import time

def generate(inputs: str):
    return inputs + f" {time.ctime()}"

def main():
    gr.Interface(
        fn=generate,
        inputs=[
            gr.components.Textbox(
                lines=2,
                label="Instruction",
                placeholder="<请在此输入你的问题>",
            ),
        ],
        outputs=[
            gr.Textbox(
                lines=5,
                label="Output",
            )
        ]
    ).queue().launch()

if __name__ == '__main__':
    main()

你好，这段简易的代码我是可以运行的！

lh5533223 · 2024-06-27T13:07:29Z

请问您的问题是否已解决？

还没有解决好？不好意思

lh5533223 · 2024-06-27T14:09:33Z

您好，此处是一个极简代码，您看看是否可以正常执行：

import gradio as gr
import time

def generate(inputs: str):
    return inputs + f" {time.ctime()}"

def main():
    gr.Interface(
        fn=generate,
        inputs=[
            gr.components.Textbox(
                lines=2,
                label="Instruction",
                placeholder="<请在此输入你的问题>",
            ),
        ],
        outputs=[
            gr.Textbox(
                lines=5,
                label="Output",
            )
        ]
    ).queue().launch()

if __name__ == '__main__':
    main()

一直都是这样，Output右上角的计时器计时到300多s，然后后台运行框自动退出，也没有报错提示！

lh5533223 · 2024-06-27T15:09:36Z

Can you help me？

lh5533223 · 2024-06-30T14:02:45Z

我的问题还没有解决？请问？

MikeDean2367 · 2024-07-01T08:43:43Z

您试试这个代码：

import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# @torch.no_grad()
def generate_response(instruction, text="", temperature=1.0, top_p=0.9, top_k=50, num_beams=1, max_new_tokens=50, repetition_penalty=1.0):
    with torch.no_grad():
        if text != "":
            input_text = f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{text}\n\n### Response:\n"
        else:
            input_text = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n"

        input_ids = tokenizer.encode(input_text, return_tensors='pt').to('cuda')

        output_ids = model.generate(
            input_ids,
            max_length=input_ids.shape[1] + max_new_tokens,
            temperature=temperature,
            top_k=top_k,
            top_p=top_p,
            num_beams=num_beams,
            repetition_penalty=repetition_penalty,
        )
        
        output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
        return output_text[len(input_text):]

if __name__ == '__main__':
    model_name = "zjunlp/knowlm-13b-zhixi"  
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype=torch.bfloat16, device_map="auto",
        load_in_8bit=True
    )

    interface = gr.Interface(
        fn=generate_response,
        inputs=[
            gr.Textbox(label="Instruction", placeholder="Enter instruction here...", lines=2, value="""从给定的文本中提取出可能的实体和实体类型，可选的实体类型为['地点', '人名']，以（实体，实体类型）的格式回答。"""),
            gr.Textbox(label="Optional Text", placeholder="Enter optional text here...", lines=2, optional=True, value="""John昨天在纽约的咖啡馆见到了他的朋友Merry。他们一起喝咖啡聊天，计划着下周去加利福尼亚（California）旅行。他们决定一起租车并预订酒店。他们先计划在下周一去圣弗朗西斯科参观旧金山大桥，下周三去洛杉矶拜访Merry的父亲威廉。"""),
            gr.Slider(label="Temperature", minimum=0.1, maximum=2.0, value=1.0, step=0.1),
            gr.Slider(label="Top p", minimum=0.0, maximum=1.0, value=0.9, step=0.01),
            gr.Slider(label="Top k", minimum=0, maximum=100, value=50, step=1),
            gr.Slider(label="Number of Beams", minimum=1, maximum=10, value=1),
            gr.Slider(label="Max New Tokens", minimum=1, maximum=512, value=50),
            gr.Slider(label="Repetition Penalty", minimum=0.1, maximum=1.6, value=1.0, step=0.1)
        ],
        outputs="text",
        title="Zhixi",
        description="<center>https://github.com/zjunlp/knowlm</center>"
    )

    interface.launch()

zxlzr · 2024-07-02T11:42:11Z

请问您的问题解决了吗

lh5533223 · 2024-07-03T14:15:47Z

您试试这个代码：

import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# @torch.no_grad()
def generate_response(instruction, text="", temperature=1.0, top_p=0.9, top_k=50, num_beams=1, max_new_tokens=50, repetition_penalty=1.0):
    with torch.no_grad():
        if text != "":
            input_text = f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{text}\n\n### Response:\n"
        else:
            input_text = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n"

        input_ids = tokenizer.encode(input_text, return_tensors='pt').to('cuda')

        output_ids = model.generate(
            input_ids,
            max_length=input_ids.shape[1] + max_new_tokens,
            temperature=temperature,
            top_k=top_k,
            top_p=top_p,
            num_beams=num_beams,
            repetition_penalty=repetition_penalty,
        )
        
        output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
        return output_text[len(input_text):]

if __name__ == '__main__':
    model_name = "zjunlp/knowlm-13b-zhixi"  
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype=torch.bfloat16, device_map="auto",
        load_in_8bit=True
    )

    interface = gr.Interface(
        fn=generate_response,
        inputs=[
            gr.Textbox(label="Instruction", placeholder="Enter instruction here...", lines=2, value="""从给定的文本中提取出可能的实体和实体类型，可选的实体类型为['地点', '人名']，以（实体，实体类型）的格式回答。"""),
            gr.Textbox(label="Optional Text", placeholder="Enter optional text here...", lines=2, optional=True, value="""John昨天在纽约的咖啡馆见到了他的朋友Merry。他们一起喝咖啡聊天，计划着下周去加利福尼亚（California）旅行。他们决定一起租车并预订酒店。他们先计划在下周一去圣弗朗西斯科参观旧金山大桥，下周三去洛杉矶拜访Merry的父亲威廉。"""),
            gr.Slider(label="Temperature", minimum=0.1, maximum=2.0, value=1.0, step=0.1),
            gr.Slider(label="Top p", minimum=0.0, maximum=1.0, value=0.9, step=0.01),
            gr.Slider(label="Top k", minimum=0, maximum=100, value=50, step=1),
            gr.Slider(label="Number of Beams", minimum=1, maximum=10, value=1),
            gr.Slider(label="Max New Tokens", minimum=1, maximum=512, value=50),
            gr.Slider(label="Repetition Penalty", minimum=0.1, maximum=1.6, value=1.0, step=0.1)
        ],
        outputs="text",
        title="Zhixi",
        description="<center>https://github.com/zjunlp/knowlm</center>"
    )

    interface.launch()

你好，我执行这段代码的运行结果如下：
Overriding torch_dtype=torch.bfloat16 with torch_dtype=torch.float16 due to requirements of bitsandbytes to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
E:\software\anaconda\install\envs\testspeed\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
warn(msg)
CUDA SETUP: Loading binary E:\software\anaconda\install\envs\testspeed\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
E:\software\anaconda\install\envs\testspeed\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
warn(msg)
CUDA SETUP: Loading binary E:\software\anaconda\install\envs\testspeed\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
E:\software\anaconda\install\envs\testspeed\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
argument of type 'WindowsPath' is not iterable
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
Traceback (most recent call last):
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
File "E:\lh\KnowLM-main\test.py", line 34, in
CUDA SETUP: Loading binary E:\software\anaconda\install\envs\testspeed\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
model = AutoModelForCausalLM.from_pretrained(
argument of type 'WindowsPath' is not iterable
File "E:\software\anaconda\install\envs\testspeed\lib\site-packages\transformers\models\auto\auto_factory.py", line 471, in from_pretrained
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
return model_class.from_pretrained(
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
File "E:\software\anaconda\install\envs\testspeed\lib\site-packages\transformers\modeling_utils.py", line 2639, in from_pretrained
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
from .utils.bitsandbytes import get_keys_to_not_convert, replace_8bit_linear
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
File "E:\software\anaconda\install\envs\testspeed\lib\site-packages\transformers\utils\bitsandbytes.py", line 9, in
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
import bitsandbytes as bnb
File "E:\software\anaconda\install\envs\testspeed\lib\site-packages\bitsandbytes_init_.py", line 7, in
from .autograd.functions import (
File "E:\software\anaconda\install\envs\testspeed\lib\site-packages\bitsandbytes\autograd_init.py", line 1, in
from ._functions import undo_layout, get_inverse_transform_indices
File "E:\software\anaconda\install\envs\testspeed\lib\site-packages\bitsandbytes\autograd_functions.py", line 9, in
import bitsandbytes.functional as F
File "E:\software\anaconda\install\envs\testspeed\lib\site-packages\bitsandbytes\functional.py", line 17, in
from .cextension import COMPILED_WITH_CUDA, lib
File "E:\software\anaconda\install\envs\testspeed\lib\site-packages\bitsandbytes\cextension.py", line 22, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues

MikeDean2367 · 2024-07-03T14:39:39Z

看起来是bitsandbytes的版本问题，我使用的版本是0.41.2.post2。还有，建议您使用linux系统吧，我现在手头没有windows设备。

lh5533223 · 2024-07-04T03:20:50Z

0.41.2.post2

好的，谢谢你

zxlzr · 2024-07-05T07:51:53Z

请问您还有其他问题吗？

zxlzr closed this as completed Jul 5, 2024

zxlzr reopened this Jul 5, 2024

zxlzr closed this as completed Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

在执行“python examples/generate_lora_web.py --base_model knowlm-13b-zhixi”进行基于web的交互效果测试时，输入instruction和input，点击submit之后，output总是出现Error的报错，并且python的运行终端直接结束运行。运行截图如下图： #142

在执行“python examples/generate_lora_web.py --base_model knowlm-13b-zhixi”进行基于web的交互效果测试时，输入instruction和input，点击submit之后，output总是出现Error的报错，并且python的运行终端直接结束运行。运行截图如下图： #142

lh5533223 commented Jun 26, 2024

MikeDean2367 commented Jun 26, 2024

lh5533223 commented Jun 26, 2024

lh5533223 commented Jun 26, 2024

MikeDean2367 commented Jun 26, 2024

lh5533223 commented Jun 26, 2024

MikeDean2367 commented Jun 26, 2024

lh5533223 commented Jun 26, 2024

lh5533223 commented Jun 27, 2024

MikeDean2367 commented Jun 27, 2024

MikeDean2367 commented Jun 27, 2024

lh5533223 commented Jun 27, 2024

MikeDean2367 commented Jun 27, 2024

zxlzr commented Jun 27, 2024

lh5533223 commented Jun 27, 2024

lh5533223 commented Jun 27, 2024

lh5533223 commented Jun 27, 2024

lh5533223 commented Jun 27, 2024

lh5533223 commented Jun 30, 2024

MikeDean2367 commented Jul 1, 2024

zxlzr commented Jul 2, 2024

lh5533223 commented Jul 3, 2024

MikeDean2367 commented Jul 3, 2024

lh5533223 commented Jul 4, 2024

zxlzr commented Jul 5, 2024

在执行“python examples/generate_lora_web.py --base_model knowlm-13b-zhixi”进行基于web的交互效果测试时，输入instruction和input，点击submit之后，output总是出现Error的报错，并且python的运行终端直接结束运行。运行截图如下图： #142

在执行“python examples/generate_lora_web.py --base_model knowlm-13b-zhixi”进行基于web的交互效果测试时，输入instruction和input，点击submit之后，output总是出现Error的报错，并且python的运行终端直接结束运行。运行截图如下图： #142

Comments

lh5533223 commented Jun 26, 2024

MikeDean2367 commented Jun 26, 2024

lh5533223 commented Jun 26, 2024

lh5533223 commented Jun 26, 2024

MikeDean2367 commented Jun 26, 2024

lh5533223 commented Jun 26, 2024

MikeDean2367 commented Jun 26, 2024

lh5533223 commented Jun 26, 2024

lh5533223 commented Jun 27, 2024

MikeDean2367 commented Jun 27, 2024

MikeDean2367 commented Jun 27, 2024

lh5533223 commented Jun 27, 2024

MikeDean2367 commented Jun 27, 2024

zxlzr commented Jun 27, 2024

lh5533223 commented Jun 27, 2024

lh5533223 commented Jun 27, 2024

lh5533223 commented Jun 27, 2024

lh5533223 commented Jun 27, 2024

lh5533223 commented Jun 30, 2024

MikeDean2367 commented Jul 1, 2024

zxlzr commented Jul 2, 2024

lh5533223 commented Jul 3, 2024

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

MikeDean2367 commented Jul 3, 2024

lh5533223 commented Jul 4, 2024

zxlzr commented Jul 5, 2024

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues