[Doc/Feature]: Llava 1.5 in OpenAI compatible server #3873

stikkireddy · 2024-04-05T19:07:32Z

📚 The doc issue

Hey vLLM team it looks like there is added support for llava 1.5 but there are no docs or examples on how to use it via the api server. Are there any reference examples? For using llava via the OpenAI sdk?

Suggest a potential alternative/fix

No response

simon-mo · 2024-04-05T20:51:33Z

I believe image input protocol has not been implemented indeed! This is more than documentation.

alsichcan · 2024-04-09T02:38:16Z

The PR #3042 , which introduced the LLaVA feature, appears not to incorporate functionalities for the OpenAI-compatible server. Based on the documentation, it's feasible to extend the existing OpenAI-compatible server (See Image Input tab in following Link) to support this feature without the need to develop a dedicated server specifically for image inputs. However, it's important to note the distinctions between GPT-4V and LLaVA, particularly that LLaVA currently does not support multiple image inputs and the 'detail' parameter.

According to OpenAI Documentation,

GPT-4 with vision is currently available to all developers who have access to GPT-4 via the gpt-4-vision-preview model and the Chat Completions API which has been updated to support image inputs.

GPT-4 Turbo with vision may behave slightly differently than GPT-4 Turbo, due to a system message we automatically insert into the conversation
GPT-4 Turbo with vision is the same as the GPT-4 Turbo preview model and performs equally as well on text tasks but has vision capabilities added
Vision is just one of many capabilities the model has

Example of uploading base64 encoded images

import base64
import requests

# OpenAI API Key
api_key = "YOUR_OPENAI_API_KEY"

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {api_key}"
}

payload = {
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

print(response.json())

Please inform me if anyone is already working on implementing this feature.
If not, I'm willing to take on the task and aim to complete it by the end of April. (Hopefully)

DarkLight1337 · 2024-04-09T12:10:21Z

Based on examples/llava_example.py, I have recently forked vllm-rocm to support image input by refactoring OpenAIServingChat. I have already verified that the model generates useful output when given OpenAI's quick start example.

Note: This change adds pillow as a dependency since it is used to read the image from bytes.

However, there is more work to be done:

The only model I have tested so far is llava-hf/llava-1.5-7b-hf since vLLM has existing support for its LlavaForConditionalGeneration architecture. Unfortunately, their config does not provide a chat template, so you have to provide it via command line (--chat-template examples/template_llava.jinja) which is quite inconvenient.
We should at least add support for LlavaLlamaForCausalLM architecture which is adopted by the original author (liuhaotian/llava-v1.5-7b).
It is unclear whether this change enables the API to work with other VLMs.

UPDATE: I have created a new branch on my fork (openai-vision-api) that consolidates my changes so far. The original upstream branch is now directly synced with upstream/upstream (discarding my previous commits) to be in line with the usual naming conventions.

stikkireddy · 2024-04-09T12:53:12Z

thankfully i only need llava 😄!
@DarkLight1337 do you plan on pushing this back to vllm along with the chat template?

DarkLight1337 · 2024-04-09T13:08:51Z

thankfully i only need llava 😄! @DarkLight1337 do you plan on pushing this back to vllm along with the chat template?

I'll create a PR once more testing has been done.

It would be great if we could compile a list of models that work/don't work with my implementation of this API. Currently, I assume that at most one image is provided since it appears that this is also the case for vLLM internals. How difficult would it be to support multiple images (possibly of different sizes)?

simon-mo · 2024-04-09T21:55:28Z

Do there exist models support multiple image inputs?

DarkLight1337 · 2024-04-09T23:09:17Z

GPT-4's API supports multiple images, so I guess their model can already handle such input.

Looking at open source, I found that MMICL explicitly supports multiple images per text prompt. They use <imagej> as the token to represent the jth image. To accommodate this, we may need to add a config option to specify how to insert image tokens into the text prompt. Currently, we use <image> * image_feature_size to represent each image; it would be more convenient to follow the original models which only use a single <image> token per image, regardless of feature size.

DarkLight1337 · 2024-04-10T17:38:52Z

I have opened a PR to support single-image input, with a POC using llava-hf/llava-1.5-7b-hf. Hopefully, this is enough to get the ball rolling.

We can deal with multi-image input further down the line.

NOTE: If you have previously checked out upstream branch based on this issue, please note that my changes have been moved to the openai-vision-api branch; the upstream branch is now directly synced with upstream/upstream (discarding my previous commits) to be in line with the usual naming conventions.

ywang96 · 2024-05-25T18:26:50Z

FYI - this is WIP and we plan to have it in the next major release. See our plan here #4194 (comment)

ywang96 · 2024-06-07T18:25:15Z

Closing this as we merged #5237

stikkireddy added the documentation Improvements or additions to documentation label Apr 5, 2024

simon-mo added the help wanted Extra attention is needed label Apr 5, 2024

simon-mo changed the title ~~[Doc]: Llava 1.5 documentation via OpenAI compatible server~~ [Doc/Feature]: Llava 1.5 in OpenAI compatible server Apr 5, 2024

simon-mo added the good first issue Good for newcomers label Apr 5, 2024

alsichcan mentioned this issue Apr 9, 2024

[Feature] Add vision language model support. #3042

Merged

DarkLight1337 mentioned this issue Apr 10, 2024

[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API #3978

Closed

7 tasks

DarkLight1337 mentioned this issue Apr 19, 2024

[Frontend] Support GPT-4V Chat Completions API #4200

Closed

DarkLight1337 mentioned this issue Jun 4, 2024

[Frontend] Add OpenAI Vision API Support #5237

Merged

ywang96 closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc/Feature]: Llava 1.5 in OpenAI compatible server #3873

[Doc/Feature]: Llava 1.5 in OpenAI compatible server #3873

stikkireddy commented Apr 5, 2024

simon-mo commented Apr 5, 2024

alsichcan commented Apr 9, 2024 •

edited

Loading

DarkLight1337 commented Apr 9, 2024 •

edited

Loading

stikkireddy commented Apr 9, 2024

DarkLight1337 commented Apr 9, 2024 •

edited

Loading

simon-mo commented Apr 9, 2024

DarkLight1337 commented Apr 9, 2024 •

edited

Loading

DarkLight1337 commented Apr 10, 2024 •

edited

Loading

ywang96 commented May 25, 2024

ywang96 commented Jun 7, 2024

[Doc/Feature]: Llava 1.5 in OpenAI compatible server #3873

[Doc/Feature]: Llava 1.5 in OpenAI compatible server #3873

Comments

stikkireddy commented Apr 5, 2024

📚 The doc issue

Suggest a potential alternative/fix

simon-mo commented Apr 5, 2024

alsichcan commented Apr 9, 2024 • edited Loading

DarkLight1337 commented Apr 9, 2024 • edited Loading

stikkireddy commented Apr 9, 2024

DarkLight1337 commented Apr 9, 2024 • edited Loading

simon-mo commented Apr 9, 2024

DarkLight1337 commented Apr 9, 2024 • edited Loading

DarkLight1337 commented Apr 10, 2024 • edited Loading

ywang96 commented May 25, 2024

ywang96 commented Jun 7, 2024

alsichcan commented Apr 9, 2024 •

edited

Loading

DarkLight1337 commented Apr 9, 2024 •

edited

Loading

DarkLight1337 commented Apr 9, 2024 •

edited

Loading

DarkLight1337 commented Apr 9, 2024 •

edited

Loading

DarkLight1337 commented Apr 10, 2024 •

edited

Loading