Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you please provide a simple script to use your multimodel like huggingface or other multimodels? #14

Closed
xinsir6 opened this issue Oct 11, 2023 · 13 comments
Labels
good first issue Good for newcomers

Comments

@xinsir6
Copy link

xinsir6 commented Oct 11, 2023

image
thank you! Amazing result and hope to use your model to caption the images!!

@1049451037
Copy link
Member

1049451037 commented Oct 11, 2023

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

@xinsir6
Copy link
Author

xinsir6 commented Oct 15, 2023

Thank you very much, I will try it to tag captions for the images collected from the internet.

@Sleepychord Sleepychord added the good first issue Good for newcomers label Oct 17, 2023
@waltonfuture
Copy link

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

How should I change the scripts to conduct inference on multiple GPUs (2*4090)?

@1049451037
Copy link
Member

1049451037 commented Oct 19, 2023

cli_demo.py and web_demo.py both support multiple GPUs. The commands to run them are introduced in README.md.

You can try simplifying them if you think they are not simple enough.

@waltonfuture
Copy link

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

image
I met this bug when running this code. Could you help with it?

@1049451037
Copy link
Member

Seems like your cuda driver is too old. Your PyTorch should be built with the corresponding cuda version as your machine.

@waltonfuture
Copy link

Seems like your cuda driver is too old. Your PyTorch should be built with the corresponding cuda version as your machine.

Thanks a lot! I have fixed this problem. Btw, does cogvlm support multiple images as input?

@1049451037
Copy link
Member

FYI: #38

@xinsir6
Copy link
Author

xinsir6 commented Nov 14, 2023

Can you provide a more faster version, such as 4bit/8bit quantize or multiple GPU inference?

@1049451037
Copy link
Member

FYI: #75

@xinsir6
Copy link
Author

xinsir6 commented Nov 16, 2023

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

In this scripts, how to set the GPU ids the model loaded, I want to load all model parameter in one GPU card so that I can caption mutiple images using multiple GPUs. However, i tried many setting in the local_rank, rank and device but still got parameters loaded in GPU0, can you provide some advice?

image
image

@1049451037
Copy link
Member

You should set CUDA_VISIBLE_DEVICES at the very beginning of your code, instead of middle of your code.

Moreover, if you set your visible devices to 3. You should set your device to cuda:0 because card 3 is cuda:0 for now.

@xinsir6
Copy link
Author

xinsir6 commented Nov 17, 2023

Yes, you are right, respect!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants