New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple image embeds in one prompt? #12
Comments
Ah I just found this issue on the LLaVA repository and it looks like this would require a different training approach: |
Right, the current version doesn't have the ability to understand multiple images. I expect to train a version that addresses this in the near term future. Can I ask what types of comparisons you're interested in? |
Ideally it would be capable of any kind of comparison, but for a start it would already be nice if it was able to point out things like "object A is only present in image 1", "image 1 is a photo, whilst image 2 is a comic but both are portraits" |
It took a bit of prompt engineering and image stitching but I got moondream1 comparing rudimentary images
what I learned: I think some fine tuning on answering questions with multiple photos in 1 image would help |
By slightly augmenting the code I was trying to embed two images into the prompt in the hope that the model would be able to make comparisons between them, but so far it looks like it always just sees the last embed. I am wondering if this approach is feasible at all and what would be required to make this work?
This is my change in sample.py:
And this is my change in text_model.py
The text was updated successfully, but these errors were encountered: