Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo.py caption result is not the same with the online demo #40

Closed
trouble-maker007 opened this issue Jan 3, 2024 · 3 comments
Closed

Comments

@trouble-maker007
Copy link

trouble-maker007 commented Jan 3, 2024

@Yuliang-Liu using the demo.py script, caption result is : "333 Smooth lighting, perfect shading. Intricate and mesmerizing, surrounding finely shattered self-luminous rainbow."
what is yours online demo parameter setting

>>> kwargs = dict()
>>> kwargs['fp16'] = True
>>> kwargs['bf16'] = False
>>> model = MonkeyLMHeadModel.from_pretrained(checkpoint, device_map='cuda', **kwargs).eval()
>>> tokenizer = QWenTokenizer.from_pretrained(checkpoint)
>>> tokenizer.padding_side = 'left'
>>> tokenizer.pad_token_id = tokenizer.eod_id

>>> print(query)
<img>7c844f8f477e79c8dad934a907337f31_3</img> Write a comprehensive and concise caption and style of the image using the original caption:: "anime style.The latest flat anime character design artwork has hyper-exceptional amount of finely beautiful details, which is delicately generated by the most technically skilled illustrator. The best framing and the best composition from Hatsune Miku's hip to her frontal face. Being in highly fashionable feminine clothing. All the features and proportions and shapes of her face and eyes and hair and her perfect feminine body are delicately super precisely reproduced original Hatsune Miku of the THE VOCALOID official artworks true to life, the bishoujo's luscious loving pose. Pale color.::333 Smooth lighting, perfect shading. Intricate and mesmerizing, surrounding finely shattered self-luminous rainbow.::77 Letter.::-0.1 "

>>> attention_mask = input_ids.attention_mask
>>> input_ids = input_ids.input_ids
>>> pred = model.generate(
...             input_ids=input_ids.cuda(),
...             attention_mask=attention_mask.cuda(),
...             do_sample=True,
...             temperature=0.7,
...             max_new_tokens=250,
...             min_new_tokens=1,
...             length_penalty=3,
...             num_return_sequences=1,
...             output_hidden_states=True,
...             use_cache=True,
...             pad_token_id=tokenizer.eod_id,
...             eos_token_id=tokenizer.eod_id,
...             )

>>> response = tokenizer.decode(pred[0][input_ids.size(1):].cpu(), skip_special_tokens=True).strip()
>>> print(response)
333 Smooth lighting, perfect shading. Intricate and mesmerizing, surrounding finely shattered self-luminous rainbow.

but in the online demo
企业微信20240103-221040@2x

the caption image is
7c844f8f477e79c8dad934a907337f31_3

@ShuoZhang2003
Copy link
Collaborator

Thank you for your attention!

The model used in the paper, the open source model and the model used in ‘demo.py’ are consistent with the original model version. We set the prompt of the caption task to "Generate the detailed caption in English:"

Later on, we trained a chat version of the model using some publicly available data, but we haven't released the model weights yet. Due to maintenance cost considerations, only one chat version of the demo is currently retained, and the reserved online demo_chat uses the chat version of the model.

The "Generate" of the online demo_chat is used for the Caption task, and its prompt is fixed in our code to "Describe the image in as much detail as possible in English, including as many elements from the image as possible, but without repetition. Answer: ". Only the "Submit" button will input the content in the input box to the model.

@WilTay1
Copy link

WilTay1 commented Feb 2, 2024

@Yuliang-Liu Any plan to release the new model? because I think the performance of online demo is really good.

@echo840
Copy link
Collaborator

echo840 commented Mar 11, 2024

@Yuliang-Liu Any plan to release the new model? because I think the performance of online demo is really good.

Thank you for your attention. We have open-sourced the weights for Monkey-Chat. You can find in https://huggingface.co/echo840/Monkey-Chat

@echo840 echo840 closed this as completed Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants