Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend]openai base64 embedding: remove the message blocker for base64 embedding #5935

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

llmpros
Copy link

@llmpros llmpros commented Jun 27, 2024

Based on my tests, the current release 0.5.0post1 is able to handle the float and base64 smoothly, just remove the blocker to enable base64

Test 1:

from langchain_openai import OpenAIEmbeddings

emb_model = OpenAIEmbeddings(
    model="/data00/e5-mistral-7b-instruct/",
    openai_api_base="http://10.37.78.125:8000/v1",
    openai_api_key="EMPTY")

embedding = emb_model.embed_query("A sentence to encode.")
print(embedding)

Output 1:

[  ......
-0.005718231201171875, 0.01071929931640625]

Test 2:

Output 2:

[  ......
0.0242767333984375, 0.0036792755126953125, 0.0306549072265625]

@DarkLight1337
Copy link
Collaborator

DarkLight1337 commented Jun 28, 2024

To speed up the CI queue, I've cancelled the distributed tests for the latest CI run in this PR since they won't pass anyway until #5905 has been merged. Now that it has been merged, please merge main into your branch so that the CI can pass once again.

@llmpros llmpros force-pushed the main branch 6 times, most recently from ba6f3a2 to 2e1d81c Compare June 28, 2024 21:37
@llmpros
Copy link
Author

llmpros commented Jun 28, 2024

@DarkLight1337 I rebased the main and it seems things are looking better - still 1 check is running

@DarkLight1337
Copy link
Collaborator

Have you verified whether the result is correct or not? (Just because it returns a value doesn't mean it's correct)

@llmpros
Copy link
Author

llmpros commented Jun 29, 2024

Have you verified whether the result is correct or not? (Just because it returns a value doesn't mean it's correct)

Yes, I compared the results from base64 and float (by using diff command), they look 100% same. Let me know if you want to see the comparison and I will upload them somewhere

@DarkLight1337
Copy link
Collaborator

DarkLight1337 commented Jun 30, 2024

I'm asking because the results you posted above seem to be different. Would be great if you can set up a test case for this!

@llmpros
Copy link
Author

llmpros commented Jun 30, 2024

I'm asking because the results you posted above seem to be different. Would be great if you can set up a test case for this!

I see. The reason they look different is because the input is different. in Test 1, the input is A sentence to encode. In Test 2, the input is: [ "Hello my name is", "The best thing about vLLM is that it supports many different models" ],

Make sense - I am adding unit tests and make sure they cover both float and base64

@llmpros llmpros force-pushed the main branch 4 times, most recently from 3a9900a to a73a7d5 Compare June 30, 2024 01:54
encoding_format="base64")


assert responses_float.data == responses_base64.data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the returned data should be base64 encoded if you pass encoding_format="base64", so you should be decoding its output before comparing them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the returned data should be base64 encoded if you pass encoding_format="base64", so you should be decoding its output before comparing them.

Interestingly, I manually ran the unit test (with encoding_format = base64, e.g. python3 openai_embedding_client.py) and added a print(obj) at https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L99 (on client side), the output (from print(obj) ) was actually array of float, rather than array of base64 string.

The embedding model I use is: https://huggingface.co/intfloat/e5-mistral-7b-instruct

Copy link
Collaborator

@DarkLight1337 DarkLight1337 Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, perhaps OpenAI client automatically performs the decoding for you? Nevertheless, we should still perform the encoding on server side.

Edit: Yeah, that seems to be the case.

https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L102-L110

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless, we should still perform the encoding on server side.

Which part in https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py should we perform the encoding? My understanding is: the input is the array of string https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py#L90

with encoding_format=base64, from my test, the response from the server (vllm) looks like already array of float https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py#L35 (print(final_res.outputs.embedding)).

Thanks for the hints

Copy link
Collaborator

@DarkLight1337 DarkLight1337 Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only apply base64 encoding on the outputs (convert float array into base64).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to encode the embedding in the response (the latest code checked in), however, adding the base64 encoding failed openai client validation (as following),

Traceback (most recent call last):
  File "/Users/xxx/vllm/examples/openai_embedding_client.py", line 34, in <module>
    responses_base64 = client.embeddings.create(input=input, model=model, 
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/resources/embeddings.py", line 117, in create
    return self._post(
           ^^^^^^^^^^^
  File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/_base_client.py", line 1250, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/_base_client.py", line 931, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/Users/bytedance/python-env/lib/python3.11/site-packages/openai/_base_client.py", line 1030, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "1 validation error for EmbeddingResponseData\nembedding\n  Input should be a valid list [type=list_type, input_value=b'AAAAAADchD8AAAAAABCUPwA...wAAAAAA6JC/AAAAAAAIgD8=', input_type=bytes]\n    For further information visit https://errors.pydantic.dev/2.7/v/list_type", 'type': 'BadRequestError', 'param': None, 'code': 400}

It looks like the validation only except the number, instead of the 'string' in base64?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a debugger to figure out why the code which I linked above is not running to decode the base64 output?

Copy link
Author

@llmpros llmpros Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L102-L110

Yeah - at https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L98, as we explicitly give encoding_format = base64, it will return the object without further decoding. However, the returned obj is array of float. So it did not run the decode section as you linked.

(Maybe I am wrong) it seems whatever the encoding_format is, there is no impact on the input to the server (vllm) https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_embedding.py#L90. From many tests, look like the model I use (https://huggingface.co/intfloat/e5-mistral-7b-instruct) always output the array of float .

Maybe we may ask https://github.com/CatherineSue on why she/he disabled the base64 in e254497 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding,, the base64 encoding should only be applied to the output, not the input.

embedding_data = EmbeddingResponseData(
index=idx, embedding=final_res.outputs.embedding)
index=idx, embedding=[embedding])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you should either send List[float] (float output) or a str (base64 output)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you should either send List[float] (float output) or a str (base64 output)

thanks for the pointer, I tried with string (without []), but the openai python client gave the following error:

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "1 validation error for EmbeddingResponseData\nembedding\n  Input should be a valid list [type=list_type, input_value=b'AAAAAADchD8AAAAAABCUPwA...wAAAAAA6JC/AAAAAAAIgD8=', input_type=bytes]\n    For further information visit https://errors.pydantic.dev/2.7/v/list_type", 'type': 'BadRequestError', 'param': None, 'code': 400}

*** Input should be a valid list*** (that was why I changed to array, but it further checks if the content in the array is number)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update the definition of EmbeddingResponseData to allow both types of outputs

@Etelis
Copy link
Contributor

Etelis commented Jun 30, 2024

Pardon me for joining here.
It feels like it's not just removing the error, but rather we need to add a base64 encoding.

@llmpros
Copy link
Author

llmpros commented Jun 30, 2024

@DarkLight1337 updated the pr and local unit tests pass. Thanks the for suggestions.

@llmpros llmpros force-pushed the main branch 2 times, most recently from fce3afb to 5f82658 Compare June 30, 2024 07:32
decoded_responses_base64_data = []
for data in responses_base64.data:
decoded_responses_base64_data.append(
np.frombuffer(base64.b64decode(data.embedding), dtype="float").tolist()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so OpenAI doesn't perform automatic decoding anymore?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so OpenAI doesn't perform automatic decoding anymore?

Yes, from https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py#L98, it seems because we explicitly specify the encoding = base64 or float, it does not run the rest of the logics in parse func

Copy link
Collaborator

@DarkLight1337 DarkLight1337 Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Can you add this example file to be explicitly tested in CI? Alternatively @Etelis can implement the test in his PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the linter is currently failing, please use bash format.sh to fix such errors locally.

@DarkLight1337
Copy link
Collaborator

Btw, there is no need to force-push since the commits will be squashed before merging anyway.

@llmpros llmpros force-pushed the main branch 4 times, most recently from d487e9f to 91da55c Compare June 30, 2024 08:14
Copy link
Collaborator

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup

@@ -20,4 +20,4 @@
model=model)

for data in responses.data:
print(data.embedding) # list of float of len 4096
print(data.embedding) # list of float of len 4096
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(data.embedding) # list of float of len 4096
print(data.embedding) # list of float of len 4096

assert responses_float.data[0].embedding == decoded_responses_base64_data[
0]
assert responses_float.data[1].embedding == decoded_responses_base64_data[
1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1]
1]

@@ -141,4 +141,4 @@ def _check_embedding_mode(self, embedding_mode: bool):
logger.warning(
"embedding_mode is False. Embedding API will not work.")
else:
logger.info("Activating the server engine with embedding enabled.")
logger.info("Activating the server engine with embedding enabled.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info("Activating the server engine with embedding enabled.")
logger.info("Activating the server engine with embedding enabled.")

@@ -89,7 +89,6 @@ async def create_embedding(self, request: EmbeddingRequest,
try:
prompt_is_tokens, prompts = parse_prompt_format(request.input)
pooling_params = request.to_pooling_params()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pooling_params = request.to_pooling_params()
pooling_params = request.to_pooling_params()

Copy link
Collaborator

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix test

Comment on lines +129 to +132
responses_float = embedding_client.embeddings.create(
input=input_texts, model=model_name, encoding_format="float")

responses_base64 = embedding_client.embeddings.create(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
responses_float = embedding_client.embeddings.create(
input=input_texts, model=model_name, encoding_format="float")
responses_base64 = embedding_client.embeddings.create(
responses_float = await embedding_client.embeddings.create(
input=input_texts, model=model_name, encoding_format="float")
responses_base64 = await embedding_client.embeddings.create(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants