Allow using images as input with gpt-4 vision #286

mingming-ma · 2023-11-10T23:19:39Z

Fix #285

Result

Tasks

Issues

~~Unresolved task: Resize the image in the browser within the max size of the image that OpenAI will process~~
- ~~Found Uncaught ReferenceError: process is not defined when trying to do sharp convert/resize images, guessing it needs server side running so cause this error.~~
  -> Resize the image in the browser within the max size of the image that OpenAI will process #395
~~cutoff bug workaround reminder~~ -> Remove workaround hardcode in the vision model #566

sweep-ai · 2023-11-10T23:20:43Z

Apply Sweep Rules to your PR?

Apply: All new business logic should have corresponding unit tests.
Apply: Refactor large functions to be more modular.

mingming-ma · 2023-11-11T00:10:37Z

Response cutoff bug investigate

Results from python scripts for local images vs. chatcraft

humphd · 2023-11-11T01:25:47Z

This might be worth filing upstream with the https://github.com/openai/openai-node repo itself, since it doesn't seem to be an OpenAI issue.

cloudflare-pages · 2023-11-11T17:06:04Z

Deploying with Cloudflare Pages

Latest commit:	`f679e4e`
Status:	✅ Deploy successful!
Preview URL:	https://f857fa27.console-overthinker-dev.pages.dev
Branch Preview URL:	https://issue-285.console-overthinker-dev.pages.dev

View logs

mingming-ma · 2023-11-11T17:19:09Z

This might be worth filing upstream with the https://github.com/openai/openai-node repo itself, since it doesn't seem to be an OpenAI issue.

Yeah, I think the openai-node will be updating in the following days, and found another thing, after setting the max_tokens in chatCompletionParams, seems no cutoff.

    max_tokens: 4096,

Result:

I found someone said max_tokens default to max value of current model, and by looking into openai-node code: ChatCompletionCreateParamsBase currently only write

  model:
    | (string & {})
    | 'gpt-4'
    | 'gpt-4-0314'
    | 'gpt-4-0613'
    | 'gpt-4-32k'
    | 'gpt-4-32k-0314'
    | 'gpt-4-32k-0613'
    | 'gpt-3.5-turbo'
    | 'gpt-3.5-turbo-16k'
    | 'gpt-3.5-turbo-0301'
    | 'gpt-3.5-turbo-0613'
    | 'gpt-3.5-turbo-16k-0613';

maybe these given models impact the default max_tokens of gpt4-vision-preview, I think I can leave this for now to develop the other tasks.

humphd · 2023-11-23T13:52:57Z

@mingming-ma this is coming along really well, I'm impressed. A few things I notice testing this today:

I wonder if the close X circle could go over top of the image (maybe top-right corner?) instead of below it. Could happen in follow-up/someone else could do.
I want to be able to click on the small version of the image in the prompt form and have it go big so I can see it better (like a light-box that shows in the centre of the screen and you can close?). Could happen in follow-up/someone else could do.
I love how the model changes to when I attach an image, that's smart
When I send the image + prompt, I lose the image:

We should probably alter the message component so that it can show the image that was attached.

Keep going, you're doing great work here!

mingming-ma · 2023-11-23T16:24:01Z

@humphd Thanks a lot for the feedback! I just tested store images in db, now we can keep images after submit. Note that since I changed the db schema, better try this in the Incognito Window so that not impact existing IndexedDB. I also tested in my normal Window and seems working well.

humphd · 2023-11-24T18:26:07Z

@mingming-ma do you want to fix the conflicts you have in this branch?

mingming-ma · 2023-11-24T19:17:48Z

@humphd Sure, seems no conflicts on my side for now, let me know I'll fix them.

humphd · 2023-11-29T15:05:10Z

For some reason, this is showing:

This branch cannot be rebased due to conflicts

At any rate, some more feedback.

First, I love this. I showed it to some other people and they were blown away too. Some comments they made:

Should be able to paste an image and have it get attached to the current prompt (can happen in follow-up)
Images in messages should use width 100%, and be clickable to see full version.

Another optimization we should make: what is the max size of the image that OpenAI will process? We should resize the image in the browser so we don't waste bandwidth when sending. Can happen in follow-up.

mingming-ma · 2023-11-29T20:50:36Z

@humphd Oh, It must be that I merged the main branch midway. Thanks a lot for the feed back! I was thinking about implementing the paste function yesterday, and we had the same thought! I'll update the width and check image size for the optimization.

mingming-ma · 2023-11-29T23:05:08Z

@humphd I've done a rebase. Are there still any conflicts?

humphd · 2023-11-29T23:09:35Z

No, looks good. I want to do a pass over the code too, I haven't read it yet. I'll try to do that this week.

mingming-ma · 2023-11-29T23:22:30Z

That's great! This PR involves many components, so no rush on the review. Honestly, the code implementation is still quite rough. Can’t wait to see your thoughts on it!

humphd · 2023-11-30T11:41:12Z

Another comment from user testing:

When I click the paper-clip icon to attach a file, after selecting the file in the Open modal dialog box and returning to the page, the prompt input textarea is no longer focused.

I think this is good feedback. We should re-focus the prompt so the user can start typing right away.

mingming-ma · 2023-11-30T16:12:52Z

Nice catch 👍 just fixed that

humphd · 2023-12-05T20:27:56Z

src/components/Message/MessageBase.tsx

-                </Markdown>
+                <>
+                  {image.map((image, index) => (
+                    <img


Do you want to use https://chakra-ui.com/docs/components/image/usage here, and have it fill the width better?

Fixed at cb915e3

humphd · 2023-12-05T20:28:56Z

src/components/PromptForm/ClipIcon.tsx

@@ -0,0 +1,52 @@
+import { useState, useRef } from "react";


Can we rename Clip everywhere, including the filename, to be Attach or something that better describes what this is for?

Fixed at 3dea32b

humphd · 2023-12-05T20:29:12Z

src/components/PromptForm/ClipIcon.tsx

+
+export default function ClipIcon({ isDisabled = false, onFileSelected }: ClipIconProps) {
+  const isMobile = useMobileBreakpoint();
+  const [colorScheme /*, setColorScheme */] = useState<"blue" | "red">("blue");


if you don't need set, please remove

Fixed at c65d529

src/components/PromptForm/DesktopPromptForm.tsx

humphd · 2023-12-05T20:35:00Z

src/lib/ChatCraftMessage.ts

    const text = this.text;
+
+    const textAndImage: OpenAI.Chat.Completions.ChatCompletionContentPart[] = [];


This is oddly named, since it might not have an image in it. I'd call it content or something more generic.

Fixed at 21c96ef

humphd · 2023-12-05T20:35:24Z

src/lib/ChatCraftMessage.ts

+
+    const textAndImage: OpenAI.Chat.Completions.ChatCompletionContentPart[] = [];
+    textAndImage.push({ type: "text", text: this.text });
+    if (this.image && this.image.length > 0) {


Why would this.image ever be undefined here?

Fixed at fba7bec

src/lib/ChatCraftModel.ts

src/lib/ai.ts

humphd · 2023-12-05T20:37:48Z

src/lib/db.ts

@@ -19,7 +19,8 @@ export type ChatCraftMessageTable = {
  user?: User;
  func?: FunctionCallParams | FunctionCallResult;
  text: string;
-  versions?: { id: string; date: Date; model: string; text: string }[];
+  image: string[];
+  versions?: { id: string; date: Date; model: string; text: string; image: string[] }[];


I think image: string[] has to be optional, since no existing data has it. If you want to include it like you have here, you need to run a migration step to add [] to all old messages in the db.

Fixed at 5d3143b

humphd · 2023-12-12T16:30:34Z

@mingming-ma can you resolve the comments I raised above that have been fixed by your recent work?

mingming-ma · 2023-12-12T16:38:07Z

@humphd Absolutely! I didn't have much free time as the end of the semester, but now that it's behind me, I'm committed to making the necessary changes. Thanks a lot for your feedback, and if you can give me another two to three days, I'll do my best to get it done.

mingming-ma · 2023-12-13T16:49:36Z

@humphd I think I have fixed most except the

Move this logic to the Model class, similar to other checks we do for OpenAI specific things (e.g., function calling).
Call this in your useEffect above.

I tried to do factor but haven't found a good approach yet. Can you give me a few more hints, or should we consider addressing this in a future pull request?

humphd · 2024-01-13T16:48:59Z

Two more requests:

Support adding more than 1 image at a time in the file picker. You can add multiple images, but not at once.
When there are multiple images, put a number on each one, so it's possible to refer to them. The LLM seems to understand the order of the images you send in the array, so as long as it matched that order, it might be OK:

In the example above, I have 2 screenshots, but no good way to describe them in my prompt. If there was a "1" and a "2" over top-left corder, similar to your current "X"

rjwignar · 2024-01-20T20:14:17Z

I've been playing around with this feature for a bit and it's really neat!
One thing I noticed is that when clicking a small image in the input area and then closing the larger version (whether by pressing "Esc" or clicking the Close Button), the input textarea is no longer focused.

Should the prompt be re-focused after closing the full image?

* Fix image styling when using gpt4-vision * Allow better scaling of images

mingming-ma · 2024-02-09T01:36:02Z

@humphd Got it! Rebased done, just DataTransfer.types issue left, will soon to find out.

humphd · 2024-02-09T01:47:55Z

@humphd Got it! Rebased done, just DataTransfer.types issue left, will soon to find out.

Excited!

humphd

Amazing. A couple small things, and this is good to go from my perspective.

I'm so happy to see this finished, @mingming-ma! Thank you for following through on what you started in the fall. This is an epic feature. People are going to be blown away.

src/Chat/ChatBase.tsx

src/components/PromptForm/DesktopPromptForm.tsx

humphd · 2024-02-09T02:24:13Z

src/lib/ai.ts

+     After setting this value, seems no cutoff
+     Tested on version openai@4.24.7, without max_tokens response still cutoff
+     */
+    max_tokens: model.supportsImages ? 4096 : undefined,


I feel like this may come back to burn us. Let's try to remember that we did this...

Will check later the newest version frequently.

mingming-ma · 2024-02-09T03:09:33Z

@humphd Thanks a bunch for all the feedback! I'm also very excited to tackle this and it's my biggest PR yet! 😄 Luckily, it will be landed soon!

src/components/PromptForm/DesktopPromptForm.tsx

mingming-ma · 2024-02-09T17:40:14Z

@humphd It's strange, I've commit a new one in [f679e4e] so I made that solved. Can you check that again?(f679e4e)

humphd · 2024-02-09T17:41:43Z

@mingming-ma bizarre, I wonder if the GitHub UI was lagging. It wasn't there when I reviewed. I apologize. Thanks for drawing my attention to this.

humphd

Ship this 🚢 🚢 🚢!

mingming-ma · 2024-02-09T17:44:35Z

No worries! Now it is my turn for the lagging 😄 Will continue do the mobile version. Thanks all the feedback again!

humphd · 2024-02-09T18:36:23Z

@mingming-ma when you're happy, please merge so we can avoid rebase issues.

mingming-ma marked this pull request as ready for review November 23, 2023 18:12

mingming-ma mentioned this pull request Nov 23, 2023

Allow using camera as input on mobile with gpt-4 vision #282

Closed

mingming-ma force-pushed the issue-285 branch from 78fab1d to 8146974 Compare November 29, 2023 22:25

humphd requested changes Dec 5, 2023

View reviewed changes

mingming-ma force-pushed the issue-285 branch from 82d5089 to 7b46b98 Compare December 11, 2023 20:14

mingming-ma requested a review from humphd December 13, 2023 16:50

humphd mentioned this pull request Jan 11, 2024

Ability to attach file(s) to a chat #325

Open

humphd requested review from rjwignar and Rachit1313 January 24, 2024 16:26

mingming-ma and others added 10 commits February 8, 2024 20:29

support continue use non vision model when history has images

8c1d624

Not being able to send empty messages

725f5cf

Fix image styling when using gpt4-vision (#419)

de8b5fb

* Fix image styling when using gpt4-vision * Allow better scaling of images

remove the unused import

7f1109c

fix share chat image missing

0e8d50d

refact useEffect dependencies

68830a6

rename Attach to AttachFileButton

6f725f5

rename image to imageUrls

1a592b6

fix typo

1f023ea

use maxHeight for responsive modal

11ebc37

mingming-ma force-pushed the issue-285 branch from ffcd269 to 11ebc37 Compare February 9, 2024 01:31

mingming-ma requested review from humphd, Amnish04 and kliu57 February 9, 2024 02:13

humphd requested changes Feb 9, 2024

View reviewed changes

refactor code

3a27281

update dependencies

f679e4e

mingming-ma requested a review from humphd February 9, 2024 14:18

humphd requested changes Feb 9, 2024

View reviewed changes

src/components/PromptForm/DesktopPromptForm.tsx Outdated Show resolved Hide resolved

humphd approved these changes Feb 9, 2024

View reviewed changes

mingming-ma merged commit ade5dcd into main Feb 9, 2024
4 checks passed

mingming-ma mentioned this pull request Apr 4, 2024

Remove workaround hardcode in the vision model #566

Merged

mingming-ma deleted the issue-285 branch April 10, 2024 16:32

		const text = this.text;

		const textAndImage: OpenAI.Chat.Completions.ChatCompletionContentPart[] = [];

Allow using images as input with gpt-4 vision #286

Allow using images as input with gpt-4 vision #286

Conversation

mingming-ma commented Nov 10, 2023 • edited Loading

Tasks

Issues

sweep-ai bot commented Nov 10, 2023

Apply Sweep Rules to your PR?

mingming-ma commented Nov 11, 2023

Response cutoff bug investigate

Results from python scripts for local images vs. chatcraft

humphd commented Nov 11, 2023

cloudflare-pages bot commented Nov 11, 2023 • edited Loading

Deploying with Cloudflare Pages

mingming-ma commented Nov 11, 2023

humphd commented Nov 23, 2023

mingming-ma commented Nov 23, 2023 • edited Loading

humphd commented Nov 24, 2023

mingming-ma commented Nov 24, 2023

humphd commented Nov 29, 2023

mingming-ma commented Nov 29, 2023

mingming-ma commented Nov 29, 2023

humphd commented Nov 29, 2023

mingming-ma commented Nov 29, 2023

humphd commented Nov 30, 2023

mingming-ma commented Nov 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humphd commented Dec 12, 2023

mingming-ma commented Dec 12, 2023

mingming-ma commented Dec 13, 2023

humphd commented Jan 13, 2024

rjwignar commented Jan 20, 2024

mingming-ma commented Feb 9, 2024

humphd commented Feb 9, 2024

humphd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mingming-ma Feb 9, 2024 • edited Loading

Choose a reason for hiding this comment

mingming-ma commented Feb 9, 2024

mingming-ma commented Feb 9, 2024

humphd commented Feb 9, 2024

humphd left a comment

Choose a reason for hiding this comment

mingming-ma commented Feb 9, 2024 • edited Loading

humphd commented Feb 9, 2024

mingming-ma commented Nov 10, 2023 •

edited

Loading

cloudflare-pages bot commented Nov 11, 2023 •

edited

Loading

mingming-ma commented Nov 23, 2023 •

edited

Loading

mingming-ma Feb 9, 2024 •

edited

Loading

mingming-ma commented Feb 9, 2024 •

edited

Loading