Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using images as input with gpt-4 vision #286

Merged
merged 55 commits into from
Feb 9, 2024
Merged

Allow using images as input with gpt-4 vision #286

merged 55 commits into from
Feb 9, 2024

Conversation

mingming-ma
Copy link
Collaborator

@mingming-ma mingming-ma commented Nov 10, 2023

Fix #285

Result

Tasks

  • Fix response cutoff bug
    • found a workaround by setting max_tokens. Tested on latest version openai@v4.20.0, without max_tokens response still cutoff
  • Update the model based on file type
  • Attach file method
    • Clip button
    • Directly Paste
  • Better preview if attach file is image
    • full width of small version
    • Click on the small image to show a larger version with a "close" button in the middle of the window.
  • Keep images in chat history
  • Images in messages should use width 100%, and be clickable to see full version.
  • Support adding more than 1 image at a time in the file picker
  • Add an index number to the top left corner of each image
  • Re-focus after closing the full image
  • Fix no text no respond bug
  • Fix number button shape: oval -> round
  • Fix number/cross button margin too close
  • Fix cursor -> cursor: pointer

Issues

Copy link

sweep-ai bot commented Nov 10, 2023

Apply Sweep Rules to your PR?

  • Apply: All new business logic should have corresponding unit tests.
  • Apply: Refactor large functions to be more modular.

@mingming-ma
Copy link
Collaborator Author

Response cutoff bug investigate

Results from python scripts for local images vs. chatcraft

image image

@humphd
Copy link
Collaborator

humphd commented Nov 11, 2023

This might be worth filing upstream with the https://github.com/openai/openai-node repo itself, since it doesn't seem to be an OpenAI issue.

Copy link

cloudflare-pages bot commented Nov 11, 2023

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: f679e4e
Status: ✅  Deploy successful!
Preview URL: https://f857fa27.console-overthinker-dev.pages.dev
Branch Preview URL: https://issue-285.console-overthinker-dev.pages.dev

View logs

@mingming-ma
Copy link
Collaborator Author

This might be worth filing upstream with the https://github.com/openai/openai-node repo itself, since it doesn't seem to be an OpenAI issue.

Yeah, I think the openai-node will be updating in the following days, and found another thing, after setting the max_tokens in chatCompletionParams, seems no cutoff.

    max_tokens: 4096,

Result:
image

I found someone said max_tokens default to max value of current model, and by looking into openai-node code: ChatCompletionCreateParamsBase currently only write

  model:
    | (string & {})
    | 'gpt-4'
    | 'gpt-4-0314'
    | 'gpt-4-0613'
    | 'gpt-4-32k'
    | 'gpt-4-32k-0314'
    | 'gpt-4-32k-0613'
    | 'gpt-3.5-turbo'
    | 'gpt-3.5-turbo-16k'
    | 'gpt-3.5-turbo-0301'
    | 'gpt-3.5-turbo-0613'
    | 'gpt-3.5-turbo-16k-0613';

maybe these given models impact the default max_tokens of gpt4-vision-preview, I think I can leave this for now to develop the other tasks.

@humphd
Copy link
Collaborator

humphd commented Nov 23, 2023

@mingming-ma this is coming along really well, I'm impressed. A few things I notice testing this today:

Screenshot 2023-11-23 at 8 48 18 AM
  1. I wonder if the close X circle could go over top of the image (maybe top-right corner?) instead of below it. Could happen in follow-up/someone else could do.
  2. I want to be able to click on the small version of the image in the prompt form and have it go big so I can see it better (like a light-box that shows in the centre of the screen and you can close?). Could happen in follow-up/someone else could do.
  3. I love how the model changes to when I attach an image, that's smart
  4. When I send the image + prompt, I lose the image:
Screenshot 2023-11-23 at 8 51 13 AM

We should probably alter the message component so that it can show the image that was attached.

Keep going, you're doing great work here!

@mingming-ma
Copy link
Collaborator Author

mingming-ma commented Nov 23, 2023

@humphd Thanks a lot for the feedback! I just tested store images in db, now we can keep images after submit. Note that since I changed the db schema, better try this in the Incognito Window so that not impact existing IndexedDB. I also tested in my normal Window and seems working well.

@humphd
Copy link
Collaborator

humphd commented Nov 24, 2023

@mingming-ma do you want to fix the conflicts you have in this branch?

@mingming-ma
Copy link
Collaborator Author

@humphd Sure, seems no conflicts on my side for now, let me know I'll fix them.

@humphd
Copy link
Collaborator

humphd commented Nov 29, 2023

For some reason, this is showing:

This branch cannot be rebased due to conflicts

At any rate, some more feedback.

First, I love this. I showed it to some other people and they were blown away too. Some comments they made:

  1. Should be able to paste an image and have it get attached to the current prompt (can happen in follow-up)
  2. Images in messages should use width 100%, and be clickable to see full version.

Another optimization we should make: what is the max size of the image that OpenAI will process? We should resize the image in the browser so we don't waste bandwidth when sending. Can happen in follow-up.

@mingming-ma
Copy link
Collaborator Author

@humphd Oh, It must be that I merged the main branch midway. Thanks a lot for the feed back! I was thinking about implementing the paste function yesterday, and we had the same thought! I'll update the width and check image size for the optimization.

@mingming-ma
Copy link
Collaborator Author

@humphd I've done a rebase. Are there still any conflicts?

@humphd
Copy link
Collaborator

humphd commented Nov 29, 2023

No, looks good. I want to do a pass over the code too, I haven't read it yet. I'll try to do that this week.

@mingming-ma
Copy link
Collaborator Author

That's great! This PR involves many components, so no rush on the review. Honestly, the code implementation is still quite rough. Can’t wait to see your thoughts on it!

@humphd
Copy link
Collaborator

humphd commented Nov 30, 2023

Another comment from user testing:

When I click the paper-clip icon to attach a file, after selecting the file in the Open modal dialog box and returning to the page, the prompt input textarea is no longer focused.

I think this is good feedback. We should re-focus the prompt so the user can start typing right away.

@mingming-ma
Copy link
Collaborator Author

Nice catch 👍 just fixed that

</Markdown>
<>
{image.map((image, index) => (
<img
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to use https://chakra-ui.com/docs/components/image/usage here, and have it fill the width better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed at cb915e3

@@ -0,0 +1,52 @@
import { useState, useRef } from "react";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename Clip everywhere, including the filename, to be Attach or something that better describes what this is for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed at 3dea32b


export default function ClipIcon({ isDisabled = false, onFileSelected }: ClipIconProps) {
const isMobile = useMobileBreakpoint();
const [colorScheme /*, setColorScheme */] = useState<"blue" | "red">("blue");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you don't need set, please remove

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed at c65d529

src/components/PromptForm/DesktopPromptForm.tsx Outdated Show resolved Hide resolved
const text = this.text;

const textAndImage: OpenAI.Chat.Completions.ChatCompletionContentPart[] = [];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is oddly named, since it might not have an image in it. I'd call it content or something more generic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed at 21c96ef


const textAndImage: OpenAI.Chat.Completions.ChatCompletionContentPart[] = [];
textAndImage.push({ type: "text", text: this.text });
if (this.image && this.image.length > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this.image ever be undefined here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed at fba7bec

src/lib/ChatCraftModel.ts Show resolved Hide resolved
src/lib/ai.ts Show resolved Hide resolved
src/lib/db.ts Outdated
@@ -19,7 +19,8 @@ export type ChatCraftMessageTable = {
user?: User;
func?: FunctionCallParams | FunctionCallResult;
text: string;
versions?: { id: string; date: Date; model: string; text: string }[];
image: string[];
versions?: { id: string; date: Date; model: string; text: string; image: string[] }[];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think image: string[] has to be optional, since no existing data has it. If you want to include it like you have here, you need to run a migration step to add [] to all old messages in the db.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed at 5d3143b

@humphd
Copy link
Collaborator

humphd commented Dec 12, 2023

@mingming-ma can you resolve the comments I raised above that have been fixed by your recent work?

@mingming-ma
Copy link
Collaborator Author

@humphd Absolutely! I didn't have much free time as the end of the semester, but now that it's behind me, I'm committed to making the necessary changes. Thanks a lot for your feedback, and if you can give me another two to three days, I'll do my best to get it done.

@mingming-ma
Copy link
Collaborator Author

@humphd I think I have fixed most except the

Move this logic to the Model class, similar to other checks we do for OpenAI specific things (e.g., function calling).
Call this in your useEffect above.

I tried to do factor but haven't found a good approach yet. Can you give me a few more hints, or should we consider addressing this in a future pull request?

@humphd
Copy link
Collaborator

humphd commented Jan 13, 2024

Two more requests:

  1. Support adding more than 1 image at a time in the file picker. You can add multiple images, but not at once.
  2. When there are multiple images, put a number on each one, so it's possible to refer to them. The LLM seems to understand the order of the images you send in the array, so as long as it matched that order, it might be OK:
Screenshot 2024-01-13 at 11 43 46 AM

In the example above, I have 2 screenshots, but no good way to describe them in my prompt. If there was a "1" and a "2" over top-left corder, similar to your current "X"

@rjwignar
Copy link
Collaborator

I've been playing around with this feature for a bit and it's really neat!
One thing I noticed is that when clicking a small image in the input area and then closing the larger version (whether by pressing "Esc" or clicking the Close Button), the input textarea is no longer focused.

Should the prompt be re-focused after closing the full image?

@mingming-ma
Copy link
Collaborator Author

@humphd Got it! Rebased done, just DataTransfer.types issue left, will soon to find out.

@humphd
Copy link
Collaborator

humphd commented Feb 9, 2024

@humphd Got it! Rebased done, just DataTransfer.types issue left, will soon to find out.

Excited!

Copy link
Collaborator

@humphd humphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing. A couple small things, and this is good to go from my perspective.

I'm so happy to see this finished, @mingming-ma! Thank you for following through on what you started in the fall. This is an epic feature. People are going to be blown away.

src/Chat/ChatBase.tsx Outdated Show resolved Hide resolved
src/components/PromptForm/DesktopPromptForm.tsx Outdated Show resolved Hide resolved
src/components/PromptForm/DesktopPromptForm.tsx Outdated Show resolved Hide resolved
After setting this value, seems no cutoff
Tested on version openai@4.24.7, without max_tokens response still cutoff
*/
max_tokens: model.supportsImages ? 4096 : undefined,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this may come back to burn us. Let's try to remember that we did this...

Copy link
Collaborator Author

@mingming-ma mingming-ma Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check later the newest version frequently.

@mingming-ma
Copy link
Collaborator Author

@humphd Thanks a bunch for all the feedback! I'm also very excited to tackle this and it's my biggest PR yet! 😄 Luckily, it will be landed soon!

@mingming-ma
Copy link
Collaborator Author

@humphd It's strange, I've commit a new one in [f679e4e] so I made that solved. Can you check that again?(f679e4e)

image

@humphd
Copy link
Collaborator

humphd commented Feb 9, 2024

@mingming-ma bizarre, I wonder if the GitHub UI was lagging. It wasn't there when I reviewed. I apologize. Thanks for drawing my attention to this.

Copy link
Collaborator

@humphd humphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship this 🚢 🚢 🚢!

@mingming-ma
Copy link
Collaborator Author

mingming-ma commented Feb 9, 2024

No worries! Now it is my turn for the lagging 😄 Will continue do the mobile version. Thanks all the feedback again!

@humphd
Copy link
Collaborator

humphd commented Feb 9, 2024

@mingming-ma when you're happy, please merge so we can avoid rebase issues.

@mingming-ma mingming-ma merged commit ade5dcd into main Feb 9, 2024
4 checks passed
@mingming-ma mingming-ma deleted the issue-285 branch April 10, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add - Allow using images as input with gpt-4 vision
7 participants