Could you please provide a simple script to use your multimodel like huggingface or other multimodels? #14

xinsir6 · 2023-10-11T10:17:48Z

thank you! Amazing result and hope to use your model to caption the images!!

1049451037 · 2023-10-11T11:16:35Z

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

xinsir6 · 2023-10-15T05:36:11Z

Thank you very much, I will try it to tag captions for the images collected from the internet.

waltonfuture · 2023-10-19T05:50:23Z

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

How should I change the scripts to conduct inference on multiple GPUs (2*4090)?

1049451037 · 2023-10-19T05:58:32Z

cli_demo.py and web_demo.py both support multiple GPUs. The commands to run them are introduced in README.md.

You can try simplifying them if you think they are not simple enough.

waltonfuture · 2023-10-19T07:17:43Z

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

I met this bug when running this code. Could you help with it?

1049451037 · 2023-10-19T07:28:44Z

Seems like your cuda driver is too old. Your PyTorch should be built with the corresponding cuda version as your machine.

waltonfuture · 2023-10-19T11:24:06Z

Seems like your cuda driver is too old. Your PyTorch should be built with the corresponding cuda version as your machine.

Thanks a lot! I have fixed this problem. Btw, does cogvlm support multiple images as input?

1049451037 · 2023-10-19T11:26:34Z

FYI: #38

xinsir6 · 2023-11-14T09:30:52Z

Can you provide a more faster version, such as 4bit/8bit quantize or multiple GPU inference?

1049451037 · 2023-11-14T09:33:13Z

FYI: #75

xinsir6 · 2023-11-16T16:05:38Z

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

In this scripts, how to set the GPU ids the model loaded, I want to load all model parameter in one GPU card so that I can caption mutiple images using multiple GPUs. However, i tried many setting in the local_rank, rank and device but still got parameters loaded in GPU0, can you provide some advice?

1049451037 · 2023-11-17T02:18:24Z

You should set CUDA_VISIBLE_DEVICES at the very beginning of your code, instead of middle of your code.

Moreover, if you set your visible devices to 3. You should set your device to cuda:0 because card 3 is cuda:0 for now.

xinsir6 · 2023-11-17T03:39:43Z

Yes, you are right, respect!!!!

Sleepychord added the good first issue Good for newcomers label Oct 17, 2023

1049451037 closed this as completed Nov 24, 2023

AlexandraDobrescu mentioned this issue Feb 21, 2024

Could you please provide a simple script to use your multimodel for extracting image features and text features like openclip with encode_image() or encode_text()? #386

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you please provide a simple script to use your multimodel like huggingface or other multimodels? #14

Could you please provide a simple script to use your multimodel like huggingface or other multimodels? #14

xinsir6 commented Oct 11, 2023

1049451037 commented Oct 11, 2023 •

edited

xinsir6 commented Oct 15, 2023

waltonfuture commented Oct 19, 2023

1049451037 commented Oct 19, 2023 •

edited

waltonfuture commented Oct 19, 2023

1049451037 commented Oct 19, 2023

waltonfuture commented Oct 19, 2023

1049451037 commented Oct 19, 2023

xinsir6 commented Nov 14, 2023

1049451037 commented Nov 14, 2023

xinsir6 commented Nov 16, 2023

1049451037 commented Nov 17, 2023

xinsir6 commented Nov 17, 2023

Could you please provide a simple script to use your multimodel like huggingface or other multimodels? #14

Could you please provide a simple script to use your multimodel like huggingface or other multimodels? #14

Comments

xinsir6 commented Oct 11, 2023

1049451037 commented Oct 11, 2023 • edited

xinsir6 commented Oct 15, 2023

waltonfuture commented Oct 19, 2023

1049451037 commented Oct 19, 2023 • edited

waltonfuture commented Oct 19, 2023

1049451037 commented Oct 19, 2023

waltonfuture commented Oct 19, 2023

1049451037 commented Oct 19, 2023

xinsir6 commented Nov 14, 2023

1049451037 commented Nov 14, 2023

xinsir6 commented Nov 16, 2023

1049451037 commented Nov 17, 2023

xinsir6 commented Nov 17, 2023

1049451037 commented Oct 11, 2023 •

edited

1049451037 commented Oct 19, 2023 •

edited