Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about training #403

Open
Hickey8 opened this issue Jul 17, 2024 · 0 comments
Open

questions about training #403

Hickey8 opened this issue Jul 17, 2024 · 0 comments

Comments

@Hickey8
Copy link

Hickey8 commented Jul 17, 2024

Appreciate for your excellent work! I have some confusion about the training.
First, what's the difference between IPAdapterFull and IPAdapterPlus? I find only the image_proj_model is different. Maybe MLPProjModel extracts more detailed features?
Second, when trained with text-image pairs, the noisy GT and image prompt are the same image? I think generally, the training sample should be a triplet of (image prompt, text prompt, GT image) if the model want to obtain the capability of multimodal input. Meanwhile, you mentioned in the paper, it is also possible to train the model without text prompt since using image prompt only is informative to guide the final generation. what's the GT image when it degrades to a img2img task without text prompts?
Thanks again for your excellent contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant