[Frontend]openai base64 embedding: remove the message blocker for base64 embedding #5935

llmpros · 2024-06-27T21:19:18Z

Based on my tests, the current release 0.5.0post1 is able to handle the float and base64 smoothly, just remove the blocker to enable base64

Test 1:

from langchain_openai import OpenAIEmbeddings

emb_model = OpenAIEmbeddings(
    model="/data00/e5-mistral-7b-instruct/",
    openai_api_base="http://10.37.78.125:8000/v1",
    openai_api_key="EMPTY")

embedding = emb_model.embed_query("A sentence to encode.")
print(embedding)

Output 1:

[  ......
-0.005718231201171875, 0.01071929931640625]

Test 2:

Output 2:

[  ......
0.0242767333984375, 0.0036792755126953125, 0.0306549072265625]

DarkLight1337 · 2024-06-28T10:04:08Z

To speed up the CI queue, I've cancelled the distributed tests for the latest CI run in this PR since they won't pass anyway until #5905 has been merged. Now that it has been merged, please merge main into your branch so that the CI can pass once again.

llmpros · 2024-06-28T23:33:25Z

@DarkLight1337 I rebased the main and it seems things are looking better - still 1 check is running

DarkLight1337 · 2024-06-29T12:09:33Z

Have you verified whether the result is correct or not? (Just because it returns a value doesn't mean it's correct)

llmpros · 2024-06-29T18:28:42Z

Have you verified whether the result is correct or not? (Just because it returns a value doesn't mean it's correct)

Yes, I compared the results from base64 and float (by using diff command), they look 100% same. Let me know if you want to see the comparison and I will upload them somewhere

DarkLight1337 · 2024-06-30T00:35:09Z

I'm asking because the results you posted above seem to be different. Would be great if you can set up a test case for this!

llmpros · 2024-06-30T00:45:18Z

I'm asking because the results you posted above seem to be different. Would be great if you can set up a test case for this!

I see. The reason they look different is because the input is different. in Test 1, the input is A sentence to encode. In Test 2, the input is: [ "Hello my name is", "The best thing about vLLM is that it supports many different models" ],

Make sense - I am adding unit tests and make sure they cover both float and base64

DarkLight1337 · 2024-06-30T02:01:33Z

examples/openai_embedding_client.py

+                                            encoding_format="base64")
+
+
+assert responses_float.data == responses_base64.data


From my understanding, the returned data should be base64 encoded if you pass encoding_format="base64", so you should be decoding its output before comparing them.

From my understanding, the returned data should be base64 encoded if you pass encoding_format="base64", so you should be decoding its output before comparing them.

Interestingly, I manually ran the unit test (with encoding_format = base64, e.g. python3 openai_embedding_client.py) and added a print(obj) at https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L99 (on client side), the output (from print(obj) ) was actually array of float, rather than array of base64 string.

The embedding model I use is: https://huggingface.co/intfloat/e5-mistral-7b-instruct

Hmm, perhaps OpenAI client automatically performs the decoding for you? Nevertheless, we should still perform the encoding on server side.

Edit: Yeah, that seems to be the case.

https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L102-L110

Nevertheless, we should still perform the encoding on server side.

Which part in https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py should we perform the encoding? My understanding is: the input is the array of string https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py#L90

with encoding_format=base64, from my test, the response from the server (vllm) looks like already array of float https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py#L35 (print(final_res.outputs.embedding)).

Thanks for the hints

We should only apply base64 encoding on the outputs (convert float array into base64).

Tried to encode the embedding in the response (the latest code checked in), however, adding the base64 encoding failed openai client validation (as following),

Traceback (most recent call last): File "/Users/xxx/vllm/examples/openai_embedding_client.py", line 34, in <module> responses_base64 = client.embeddings.create(input=input, model=model, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/resources/embeddings.py", line 117, in create return self._post( ^^^^^^^^^^^ File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/_base_client.py", line 1250, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/_base_client.py", line 931, in request return self._request( ^^^^^^^^^^^^^^ File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/_base_client.py", line 1030, in _request raise self._make_status_error_from_response(err.response) from None openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "1 validation error for EmbeddingResponseData\nembedding\n Input should be a valid list [type=list_type, input_value=b'AAAAAADchD8AAAAAABCUPwA...wAAAAAA6JC/AAAAAAAIgD8=', input_type=bytes]\n For further information visit https://errors.pydantic.dev/2.7/v/list_type", 'type': 'BadRequestError', 'param': None, 'code': 400}

It looks like the validation only except the number, instead of the 'string' in base64?

Can you use a debugger to figure out why the code which I linked above is not running to decode the base64 output?

https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L102-L110

Yeah - at https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L98, as we explicitly give encoding_format = base64, it will return the object without further decoding. However, the returned obj is array of float. So it did not run the decode section as you linked.

(Maybe I am wrong) it seems whatever the encoding_format is, there is no impact on the input to the server (vllm) https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py#L90. From many tests, look like the model I use (https://huggingface.co/intfloat/e5-mistral-7b-instruct) always output the array of float .

Maybe we may ask https://github.com/CatherineSue on why she/he disabled the base64 in e254497 ?

From my understanding,, the base64 encoding should only be applied to the output, not the input.

DarkLight1337 · 2024-06-30T06:44:04Z

vllm/entrypoints/openai/serving_embedding.py

        embedding_data = EmbeddingResponseData(
-            index=idx, embedding=final_res.outputs.embedding)
+            index=idx, embedding=[embedding])


I think here you should either send List[float] (float output) or a str (base64 output)

I think here you should either send List[float] (float output) or a str (base64 output)

thanks for the pointer, I tried with string (without []), but the openai python client gave the following error:

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "1 validation error for EmbeddingResponseData\nembedding\n Input should be a valid list [type=list_type, input_value=b'AAAAAADchD8AAAAAABCUPwA...wAAAAAA6JC/AAAAAAAIgD8=', input_type=bytes]\n For further information visit https://errors.pydantic.dev/2.7/v/list_type", 'type': 'BadRequestError', 'param': None, 'code': 400}

*** Input should be a valid list*** (that was why I changed to array, but it further checks if the content in the array is number)

I think we should update the definition of EmbeddingResponseData to allow both types of outputs

Etelis · 2024-06-30T06:58:08Z

Pardon me for joining here.
It feels like it's not just removing the error, but rather we need to add a base64 encoding.

llmpros · 2024-06-30T07:28:45Z

@DarkLight1337 updated the pr and local unit tests pass. Thanks the for suggestions.

DarkLight1337 · 2024-06-30T07:32:46Z

examples/openai_embedding_client.py

+decoded_responses_base64_data = []
+for data in responses_base64.data:
+    decoded_responses_base64_data.append(
+        np.frombuffer(base64.b64decode(data.embedding), dtype="float").tolist()


Hmm, so OpenAI doesn't perform automatic decoding anymore?

Hmm, so OpenAI doesn't perform automatic decoding anymore?

Yes, from https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L98, it seems because we explicitly specify the encoding = base64 or float, it does not run the rest of the logics in parse func

Ah, I see. Can you add this example file to be explicitly tested in CI? Alternatively @Etelis can implement the test in his PR.

Also, the linter is currently failing, please use bash format.sh to fix such errors locally.

DarkLight1337 · 2024-06-30T07:41:31Z

Btw, there is no need to force-push since the commits will be squashed before merging anyway.

DarkLight1337

Cleanup

DarkLight1337 · 2024-06-30T10:11:35Z

examples/openai_embedding_client.py

@@ -20,4 +20,4 @@
                                     model=model)

 for data in responses.data:
-    print(data.embedding)  # list of float of len 4096
+    print(data.embedding)  # list of float of len 4096


Suggested change

print(data.embedding) # list of float of len 4096

print(data.embedding) # list of float of len 4096

DarkLight1337 · 2024-06-30T10:11:54Z

tests/entrypoints/openai/test_embedding.py

+    assert responses_float.data[0].embedding == decoded_responses_base64_data[
+        0]
+    assert responses_float.data[1].embedding == decoded_responses_base64_data[
+        1]


Suggested change

1]

1]

DarkLight1337 · 2024-06-30T10:12:11Z

vllm/entrypoints/openai/serving_embedding.py

@@ -141,4 +141,4 @@ def _check_embedding_mode(self, embedding_mode: bool):
            logger.warning(
                "embedding_mode is False. Embedding API will not work.")
        else:
-            logger.info("Activating the server engine with embedding enabled.")
+            logger.info("Activating the server engine with embedding enabled.")


Suggested change

logger.info("Activating the server engine with embedding enabled.")

logger.info("Activating the server engine with embedding enabled.")

DarkLight1337 · 2024-06-30T10:12:29Z

vllm/entrypoints/openai/serving_embedding.py

@@ -89,7 +89,6 @@ async def create_embedding(self, request: EmbeddingRequest,
        try:
            prompt_is_tokens, prompts = parse_prompt_format(request.input)
            pooling_params = request.to_pooling_params()


Suggested change

pooling_params = request.to_pooling_params()

pooling_params = request.to_pooling_params()

DarkLight1337

Fix test

DarkLight1337 · 2024-06-30T10:14:27Z

tests/entrypoints/openai/test_embedding.py

+    responses_float = embedding_client.embeddings.create(
+        input=input_texts, model=model_name, encoding_format="float")
+
+    responses_base64 = embedding_client.embeddings.create(


Suggested change

responses_float = embedding_client.embeddings.create(

input=input_texts, model=model_name, encoding_format="float")

responses_base64 = embedding_client.embeddings.create(

responses_float = await embedding_client.embeddings.create(

input=input_texts, model=model_name, encoding_format="float")

responses_base64 = await embedding_client.embeddings.create(

This was referenced Jun 27, 2024

[Feature]: Support for OpenAIEmbeddings with Langchain #5734

Open

[Frontend] Add support for base64 encoding in vLLM embeddings #5897

Draft

llmpros force-pushed the main branch 6 times, most recently from ba6f3a2 to 2e1d81c Compare June 28, 2024 21:37

llmpros force-pushed the main branch 4 times, most recently from 3a9900a to a73a7d5 Compare June 30, 2024 01:54

DarkLight1337 reviewed Jun 30, 2024

View reviewed changes

llmpros force-pushed the main branch from a73a7d5 to ec33cca Compare June 30, 2024 04:55

DarkLight1337 reviewed Jun 30, 2024

View reviewed changes

llmpros force-pushed the main branch from ec33cca to 56e949c Compare June 30, 2024 07:28

llmpros force-pushed the main branch 2 times, most recently from fce3afb to 5f82658 Compare June 30, 2024 07:32

DarkLight1337 reviewed Jun 30, 2024

View reviewed changes

llmpros force-pushed the main branch from 5f82658 to 6e913b5 Compare June 30, 2024 07:40

llmpros force-pushed the main branch from 6e913b5 to 422ec1a Compare June 30, 2024 07:59

llmpros force-pushed the main branch 4 times, most recently from d487e9f to 91da55c Compare June 30, 2024 08:14

[openai embedding] add base64 encoding in EmbeddingResponseData

dd0bbd9

llmpros force-pushed the main branch from 91da55c to dd0bbd9 Compare June 30, 2024 08:19

DarkLight1337 reviewed Jun 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend]openai base64 embedding: remove the message blocker for base64 embedding #5935

[Frontend]openai base64 embedding: remove the message blocker for base64 embedding #5935

llmpros commented Jun 27, 2024 •

edited

Loading

DarkLight1337 commented Jun 28, 2024 •

edited

Loading

llmpros commented Jun 28, 2024

DarkLight1337 commented Jun 29, 2024

llmpros commented Jun 29, 2024

DarkLight1337 commented Jun 30, 2024 •

edited

Loading

llmpros commented Jun 30, 2024 •

edited

Loading

DarkLight1337 Jun 30, 2024

llmpros Jun 30, 2024

DarkLight1337 Jun 30, 2024 •

edited

Loading

llmpros Jun 30, 2024

DarkLight1337 Jun 30, 2024 •

edited

Loading

llmpros Jun 30, 2024

DarkLight1337 Jun 30, 2024

llmpros Jun 30, 2024 •

edited

Loading

DarkLight1337 Jun 30, 2024

DarkLight1337 Jun 30, 2024

llmpros Jun 30, 2024

DarkLight1337 Jun 30, 2024

Etelis commented Jun 30, 2024

llmpros commented Jun 30, 2024

DarkLight1337 Jun 30, 2024

llmpros Jun 30, 2024

DarkLight1337 Jun 30, 2024 •

edited

Loading

DarkLight1337 Jun 30, 2024

DarkLight1337 commented Jun 30, 2024

DarkLight1337 left a comment

DarkLight1337 Jun 30, 2024

DarkLight1337 Jun 30, 2024

DarkLight1337 Jun 30, 2024

DarkLight1337 Jun 30, 2024

DarkLight1337 left a comment

DarkLight1337 Jun 30, 2024

		encoding_format="base64")


		assert responses_float.data == responses_base64.data

	print(data.embedding) # list of float of len 4096
	print(data.embedding) # list of float of len 4096

	logger.info("Activating the server engine with embedding enabled.")
	logger.info("Activating the server engine with embedding enabled.")

	pooling_params = request.to_pooling_params()
	pooling_params = request.to_pooling_params()

[Frontend]openai base64 embedding: remove the message blocker for base64 embedding #5935

Are you sure you want to change the base?

[Frontend]openai base64 embedding: remove the message blocker for base64 embedding #5935

Conversation

llmpros commented Jun 27, 2024 • edited Loading

DarkLight1337 commented Jun 28, 2024 • edited Loading

llmpros commented Jun 28, 2024

DarkLight1337 commented Jun 29, 2024

llmpros commented Jun 29, 2024

DarkLight1337 commented Jun 30, 2024 • edited Loading

llmpros commented Jun 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DarkLight1337 Jun 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DarkLight1337 Jun 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llmpros Jun 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Etelis commented Jun 30, 2024

llmpros commented Jun 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DarkLight1337 Jun 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DarkLight1337 commented Jun 30, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DarkLight1337 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llmpros commented Jun 27, 2024 •

edited

Loading

DarkLight1337 commented Jun 28, 2024 •

edited

Loading

DarkLight1337 commented Jun 30, 2024 •

edited

Loading

llmpros commented Jun 30, 2024 •

edited

Loading

DarkLight1337 Jun 30, 2024 •

edited

Loading

DarkLight1337 Jun 30, 2024 •

edited

Loading

llmpros Jun 30, 2024 •

edited

Loading

DarkLight1337 Jun 30, 2024 •

edited

Loading