-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using images as input with gpt-4 vision #286
Conversation
Apply Sweep Rules to your PR?
|
This might be worth filing upstream with the https://github.com/openai/openai-node repo itself, since it doesn't seem to be an OpenAI issue. |
Deploying with Cloudflare Pages
|
Yeah, I think the max_tokens: 4096, I found someone said model:
| (string & {})
| 'gpt-4'
| 'gpt-4-0314'
| 'gpt-4-0613'
| 'gpt-4-32k'
| 'gpt-4-32k-0314'
| 'gpt-4-32k-0613'
| 'gpt-3.5-turbo'
| 'gpt-3.5-turbo-16k'
| 'gpt-3.5-turbo-0301'
| 'gpt-3.5-turbo-0613'
| 'gpt-3.5-turbo-16k-0613'; maybe these given models impact the default max_tokens of |
@mingming-ma this is coming along really well, I'm impressed. A few things I notice testing this today:
We should probably alter the message component so that it can show the image that was attached. Keep going, you're doing great work here! |
@humphd Thanks a lot for the feedback! I just tested store images in db, now we can keep images after submit. Note that since I changed the db schema, better try this in the Incognito Window so that not impact existing IndexedDB. I also tested in my normal Window and seems working well. |
@mingming-ma do you want to fix the conflicts you have in this branch? |
@humphd Sure, seems no conflicts on my side for now, let me know I'll fix them. |
For some reason, this is showing:
At any rate, some more feedback. First, I love this. I showed it to some other people and they were blown away too. Some comments they made:
Another optimization we should make: what is the max size of the image that OpenAI will process? We should resize the image in the browser so we don't waste bandwidth when sending. Can happen in follow-up. |
@humphd Oh, It must be that I merged the main branch midway. Thanks a lot for the feed back! I was thinking about implementing the paste function yesterday, and we had the same thought! I'll update the width and check image size for the optimization. |
78fab1d
to
8146974
Compare
@humphd I've done a rebase. Are there still any conflicts? |
No, looks good. I want to do a pass over the code too, I haven't read it yet. I'll try to do that this week. |
That's great! This PR involves many components, so no rush on the review. Honestly, the code implementation is still quite rough. Can’t wait to see your thoughts on it! |
Another comment from user testing:
I think this is good feedback. We should re-focus the prompt so the user can start typing right away. |
Nice catch 👍 just fixed that |
</Markdown> | ||
<> | ||
{image.map((image, index) => ( | ||
<img |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to use https://chakra-ui.com/docs/components/image/usage here, and have it fill the width better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed at cb915e3
@@ -0,0 +1,52 @@ | |||
import { useState, useRef } from "react"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename Clip
everywhere, including the filename, to be Attach
or something that better describes what this is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed at 3dea32b
|
||
export default function ClipIcon({ isDisabled = false, onFileSelected }: ClipIconProps) { | ||
const isMobile = useMobileBreakpoint(); | ||
const [colorScheme /*, setColorScheme */] = useState<"blue" | "red">("blue"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you don't need set, please remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed at c65d529
src/lib/ChatCraftMessage.ts
Outdated
const text = this.text; | ||
|
||
const textAndImage: OpenAI.Chat.Completions.ChatCompletionContentPart[] = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is oddly named, since it might not have an image in it. I'd call it content
or something more generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed at 21c96ef
src/lib/ChatCraftMessage.ts
Outdated
|
||
const textAndImage: OpenAI.Chat.Completions.ChatCompletionContentPart[] = []; | ||
textAndImage.push({ type: "text", text: this.text }); | ||
if (this.image && this.image.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this.image
ever be undefined here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed at fba7bec
src/lib/db.ts
Outdated
@@ -19,7 +19,8 @@ export type ChatCraftMessageTable = { | |||
user?: User; | |||
func?: FunctionCallParams | FunctionCallResult; | |||
text: string; | |||
versions?: { id: string; date: Date; model: string; text: string }[]; | |||
image: string[]; | |||
versions?: { id: string; date: Date; model: string; text: string; image: string[] }[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think image: string[]
has to be optional, since no existing data has it. If you want to include it like you have here, you need to run a migration step to add []
to all old messages in the db.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed at 5d3143b
82d5089
to
7b46b98
Compare
@mingming-ma can you resolve the comments I raised above that have been fixed by your recent work? |
@humphd Absolutely! I didn't have much free time as the end of the semester, but now that it's behind me, I'm committed to making the necessary changes. Thanks a lot for your feedback, and if you can give me another two to three days, I'll do my best to get it done. |
@humphd I think I have fixed most except the
I tried to do factor but haven't found a good approach yet. Can you give me a few more hints, or should we consider addressing this in a future pull request? |
I've been playing around with this feature for a bit and it's really neat! Should the prompt be re-focused after closing the full image? |
* Fix image styling when using gpt4-vision * Allow better scaling of images
ffcd269
to
11ebc37
Compare
@humphd Got it! Rebased done, just |
Excited! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing. A couple small things, and this is good to go from my perspective.
I'm so happy to see this finished, @mingming-ma! Thank you for following through on what you started in the fall. This is an epic feature. People are going to be blown away.
After setting this value, seems no cutoff | ||
Tested on version openai@4.24.7, without max_tokens response still cutoff | ||
*/ | ||
max_tokens: model.supportsImages ? 4096 : undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this may come back to burn us. Let's try to remember that we did this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will check later the newest version frequently.
@humphd Thanks a bunch for all the feedback! I'm also very excited to tackle this and it's my biggest PR yet! 😄 Luckily, it will be landed soon! |
@mingming-ma bizarre, I wonder if the GitHub UI was lagging. It wasn't there when I reviewed. I apologize. Thanks for drawing my attention to this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship this 🚢 🚢 🚢!
No worries! Now it is my turn for the lagging 😄 Will continue do the mobile version. Thanks all the feedback again! |
@mingming-ma when you're happy, please merge so we can avoid rebase issues. |
Fix #285
Result
Tasks
max_tokens
. Tested on latest version openai@v4.20.0, without max_tokens response still cutoffcursor: pointer
Issues
Unresolved task: Resize the image in the browser within the max size of the image that OpenAI will processFoundUncaught ReferenceError: process is not defined
when trying to dosharp
convert/resize images, guessing it needs server side running so cause this error.-> Resize the image in the browser within the max size of the image that OpenAI will process #395
cutoff bug workaround reminder-> Remove workaround hardcode in the vision model #566