[FEAT]: Vision Models and embedding Documents #3395

kalfbz · 2025-03-05T00:38:29Z

What would you like to see?

When using a vision model like gemini-2.0-flash, it can extract information from images. However, the current issue is that when an image or document is uploaded in the chatbox, it gets embedded into the workspace instead of being sent to the model for interpretation.

Would it be possible to modify this behavior so that uploading an image or document in the chatbox sends it to the model for interpretation, while files intended for embedding into the workspace should be uploaded via the upload button in the left-side workspace?

th3f001 · 2025-03-08T00:10:44Z

...or maybe decouple the functionality from the workspace embedder (that I would assume is meant to be used with text-based files to be organized in chunks before being sent to the model)...and make a Custom Skill out of it?

So at that point the user would have 2 separate flows:
-> standard files to RAG upon -> workspace file embedder
-> image files to be passed to a Vision-enabled model -> Agent with dedicated Skill

Following the same approach we could also have a 3rd flow:
-> PDFs and other mixed-contents files to be passed to an OCR-enabled model (like latest Mistral-OCR) -> Agent with dedicated skill

kalfbz added enhancement feature request labels Mar 5, 2025

timothycarambat assigned timothycarambat and shatfield4 and unassigned timothycarambat Mar 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Vision Models and embedding Documents #3395

[FEAT]: Vision Models and embedding Documents #3395

kalfbz commented Mar 5, 2025

th3f001 commented Mar 8, 2025

[FEAT]: Vision Models and embedding Documents #3395

[FEAT]: Vision Models and embedding Documents #3395

Comments

kalfbz commented Mar 5, 2025

What would you like to see?

th3f001 commented Mar 8, 2025