-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend]openai base64 embedding: remove the message blocker for base64 embedding #5935
base: main
Are you sure you want to change the base?
Conversation
To speed up the CI queue, I've cancelled the distributed tests for the latest CI run in this PR since they won't pass anyway until #5905 has been merged. Now that it has been merged, please merge |
ba6f3a2
to
2e1d81c
Compare
@DarkLight1337 I rebased the main and it seems things are looking better - still 1 check is running |
Have you verified whether the result is correct or not? (Just because it returns a value doesn't mean it's correct) |
Yes, I compared the results from base64 and float (by using |
I'm asking because the results you posted above seem to be different. Would be great if you can set up a test case for this! |
I see. The reason they look different is because the input is different. in Test 1, the input is Make sense - I am adding unit tests and make sure they cover both float and base64 |
3a9900a
to
a73a7d5
Compare
examples/openai_embedding_client.py
Outdated
encoding_format="base64") | ||
|
||
|
||
assert responses_float.data == responses_base64.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, the returned data should be base64 encoded if you pass encoding_format="base64"
, so you should be decoding its output before comparing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, the returned data should be base64 encoded if you pass
encoding_format="base64"
, so you should be decoding its output before comparing them.
Interestingly, I manually ran the unit test (with encoding_format = base64
, e.g. python3 openai_embedding_client.py
) and added a print(obj)
at https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L99 (on client side), the output (from print(obj)
) was actually array of float
, rather than array of base64
string.
The embedding model I use is: https://huggingface.co/intfloat/e5-mistral-7b-instruct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, perhaps OpenAI client automatically performs the decoding for you? Nevertheless, we should still perform the encoding on server side.
Edit: Yeah, that seems to be the case.
https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L102-L110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevertheless, we should still perform the encoding on server side.
Which part in https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py should we perform the encoding? My understanding is: the input is the array of string https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py#L90
with encoding_format=base64
, from my test, the response from the server (vllm) looks like already array of float
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py#L35 (print(final_res.outputs.embedding)
).
Thanks for the hints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only apply base64 encoding on the outputs (convert float array into base64).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to encode the embedding in the response (the latest code checked in), however, adding the base64 encoding failed openai client validation (as following),
Traceback (most recent call last):
File "/Users/xxx/vllm/examples/openai_embedding_client.py", line 34, in <module>
responses_base64 = client.embeddings.create(input=input, model=model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/resources/embeddings.py", line 117, in create
return self._post(
^^^^^^^^^^^
File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/_base_client.py", line 1250, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/_base_client.py", line 931, in request
return self._request(
^^^^^^^^^^^^^^
File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/_base_client.py", line 1030, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "1 validation error for EmbeddingResponseData\nembedding\n Input should be a valid list [type=list_type, input_value=b'AAAAAADchD8AAAAAABCUPwA...wAAAAAA6JC/AAAAAAAIgD8=', input_type=bytes]\n For further information visit https://errors.pydantic.dev/2.7/v/list_type", 'type': 'BadRequestError', 'param': None, 'code': 400}
It looks like the validation only except the number
, instead of the 'string' in base64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a debugger to figure out why the code which I linked above is not running to decode the base64 output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L102-L110
Yeah - at https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L98, as we explicitly give encoding_format = base64
, it will return the object without further decoding. However, the returned obj is array of float
. So it did not run the decode section as you linked.
(Maybe I am wrong) it seems whatever the encoding_format is, there is no impact on the input to the server (vllm) https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py#L90. From many tests, look like the model I use (https://huggingface.co/intfloat/e5-mistral-7b-instruct) always output the array of float
.
Maybe we may ask https://github.com/CatherineSue on why she/he disabled the base64
in e254497 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding,, the base64 encoding should only be applied to the output, not the input.
embedding_data = EmbeddingResponseData( | ||
index=idx, embedding=final_res.outputs.embedding) | ||
index=idx, embedding=[embedding]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here you should either send List[float]
(float output) or a str
(base64 output)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here you should either send
List[float]
(float output) or astr
(base64 output)
thanks for the pointer, I tried with string
(without []
), but the openai python client gave the following error:
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "1 validation error for EmbeddingResponseData\nembedding\n Input should be a valid list [type=list_type, input_value=b'AAAAAADchD8AAAAAABCUPwA...wAAAAAA6JC/AAAAAAAIgD8=', input_type=bytes]\n For further information visit https://errors.pydantic.dev/2.7/v/list_type", 'type': 'BadRequestError', 'param': None, 'code': 400}
*** Input should be a valid list*** (that was why I changed to array, but it further checks if the content in the array is number
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update the definition of EmbeddingResponseData
to allow both types of outputs
Pardon me for joining here. |
@DarkLight1337 updated the pr and local unit tests pass. Thanks the for suggestions. |
fce3afb
to
5f82658
Compare
examples/openai_embedding_client.py
Outdated
decoded_responses_base64_data = [] | ||
for data in responses_base64.data: | ||
decoded_responses_base64_data.append( | ||
np.frombuffer(base64.b64decode(data.embedding), dtype="float").tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so OpenAI doesn't perform automatic decoding anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so OpenAI doesn't perform automatic decoding anymore?
Yes, from https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L98, it seems because we explicitly specify the encoding = base64 or float
, it does not run the rest of the logics in parse
func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Can you add this example file to be explicitly tested in CI? Alternatively @Etelis can implement the test in his PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the linter is currently failing, please use bash format.sh
to fix such errors locally.
Btw, there is no need to force-push since the commits will be squashed before merging anyway. |
d487e9f
to
91da55c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup
@@ -20,4 +20,4 @@ | |||
model=model) | |||
|
|||
for data in responses.data: | |||
print(data.embedding) # list of float of len 4096 | |||
print(data.embedding) # list of float of len 4096 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(data.embedding) # list of float of len 4096 | |
print(data.embedding) # list of float of len 4096 | |
assert responses_float.data[0].embedding == decoded_responses_base64_data[ | ||
0] | ||
assert responses_float.data[1].embedding == decoded_responses_base64_data[ | ||
1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1] | |
1] | |
@@ -141,4 +141,4 @@ def _check_embedding_mode(self, embedding_mode: bool): | |||
logger.warning( | |||
"embedding_mode is False. Embedding API will not work.") | |||
else: | |||
logger.info("Activating the server engine with embedding enabled.") | |||
logger.info("Activating the server engine with embedding enabled.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info("Activating the server engine with embedding enabled.") | |
logger.info("Activating the server engine with embedding enabled.") | |
@@ -89,7 +89,6 @@ async def create_embedding(self, request: EmbeddingRequest, | |||
try: | |||
prompt_is_tokens, prompts = parse_prompt_format(request.input) | |||
pooling_params = request.to_pooling_params() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pooling_params = request.to_pooling_params() | |
pooling_params = request.to_pooling_params() | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix test
responses_float = embedding_client.embeddings.create( | ||
input=input_texts, model=model_name, encoding_format="float") | ||
|
||
responses_base64 = embedding_client.embeddings.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
responses_float = embedding_client.embeddings.create( | |
input=input_texts, model=model_name, encoding_format="float") | |
responses_base64 = embedding_client.embeddings.create( | |
responses_float = await embedding_client.embeddings.create( | |
input=input_texts, model=model_name, encoding_format="float") | |
responses_base64 = await embedding_client.embeddings.create( |
Based on my tests, the current release 0.5.0post1 is able to handle the float and base64 smoothly, just remove the blocker to enable base64
Test 1:
Output 1:
Test 2:
Output 2: