questions about training #403

Hickey8 · 2024-07-17T04:50:13Z

Appreciate for your excellent work! I have some confusion about the training.
First, what's the difference between IPAdapterFull and IPAdapterPlus? I find only the image_proj_model is different. Maybe MLPProjModel extracts more detailed features?
Second, when trained with text-image pairs, the noisy GT and image prompt are the same image? I think generally, the training sample should be a triplet of (image prompt, text prompt, GT image) if the model want to obtain the capability of multimodal input. Meanwhile, you mentioned in the paper, it is also possible to train the model without text prompt since using image prompt only is informative to guide the final generation. what's the GT image when it degrades to a img2img task without text prompts?
Thanks again for your excellent contributions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about training #403

questions about training #403

Hickey8 commented Jul 17, 2024 •

edited

Loading

questions about training #403

questions about training #403

Comments

Hickey8 commented Jul 17, 2024 • edited Loading

Hickey8 commented Jul 17, 2024 •

edited

Loading