Best way to use an OpenAI-compatible embedding API #11809

BeautyyuYanli · 2024-03-11T09:54:32Z

BeautyyuYanli
Mar 11, 2024

Hello everyone!

I'm using my own OpenAI-compatible embedding API, the runnable code:

from llama_index.embeddings.openai import OpenAIEmbedding

emb_model = OpenAIEmbedding(
    api_key="DUMMY_API_KEY",
    api_base="http://192.168.0.1:8000",
    model_name="intfloat/multilingual-e5-large",
)
emb = emb_model.get_text_embedding("hello world")
print(emb)

But if I use the arg model instead of model_name, It will raise:

ValueError: 'intfloat/multilingual-e5-large' is not a valid OpenAIEmbeddingModelType

I checked the code in llama_index/embeddings/openai/base.py,

class OpenAIEmbedding(BaseEmbedding):
    ...
    def __init__(
        ...
        self._query_engine = get_engine(mode, model, _QUERY_MODE_MODEL_DICT)
        self._text_engine = get_engine(mode, model, _TEXT_MODE_MODEL_DICT)

        if "model_name" in kwargs:
            model_name = kwargs.pop("model_name")
            self._query_engine = self._text_engine = model_name
        else:
            model_name = model

So I'm not sure if it is a good idea to set the model_name arg to use my custom OpenAI-compatible API? If not, is there a better way? If yes, maybe we need to add some documents?

Answered by logan-markewich

Mar 4, 2025

@BeautyyuYanli @WinPooh32 @david1542 @juanluisrosaramos @vikrantdeshpande09876

Here's the non-langchain method for using openai-like API's

pip install llama-index-llms-openai-like llama-index-embeddings-openai

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai_like import OpenAILike

embed_model = OpenAIEmbedding(
  model="some model", # use `model` instead of `model_name` -- janky I know
  api_base="...",
  api_key="fake",
  embed_batch_size=10,
)

llm = OpenAILike(
  model_name="my model",
  api_key="fake",
  api_base="...",
  # context window should match whatever llm you are using
  context_window=32000,  
  # specifies whether or not to use chat c…

View full answer

@BeautyyuYanli · 2024-03-11T09:54:45Z

dosubot[bot]
bot Mar 11, 2024

Hey there @BeautyyuYanli! Good to see you back in the world of embeddings 🌐🦙

Based on the code and context you've provided, it seems like you're on the right track. Using the model_name argument to specify your custom OpenAI-compatible API is a valid approach. This is because the model_name parameter allows for directly specifying the embedding engine without going through the mapping logic that uses the model parameter.

In the OpenAIEmbedding class, if model_name is provided in the keyword arguments (kwargs) during the initialization, it overrides the engine selection logic based on the model parameter. Both the _query_engine and _text_engine attributes are set directly to the value of model_name. This provides flexibility for users who know exactly which engine they want to use or need to use an engine that doesn't fit the predefined model types.

Here's the relevant part of the code from llama_index/embeddings/openai/base.py:

class OpenAIEmbedding(BaseEmbedding):
    ...
    def __init__(
        ...
        self._query_engine = get_engine(mode, model, _QUERY_MODE_MODEL_DICT)
        self._text_engine = get_engine(mode, model, _TEXT_MODE_MODEL_DICT)

        if "model_name" in kwargs:
            model_name = kwargs.pop("model_name")
            self._query_engine = self._text_engine = model_name
        else:
            model_name = model

As for the documentation, I agree with you. It would be beneficial to have more detailed documentation about the usage of model and model_name parameters in the OpenAIEmbedding class. This would help users to better understand how to use these parameters and how they affect the behavior of the class. I recommend opening an issue in the LlamaIndex repository to suggest this improvement.

I hope this helps! If you have any more questions, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

1 reply

BeautyyuYanli Mar 11, 2024
Author

Do you generate the answer by your illusion?

Searcherr · 2024-05-06T05:38:06Z

Searcherr
May 6, 2024

Hello everyone!

I'm using my own OpenAI-compatible embedding API, the runnable code:

from llama_index.embeddings.openai import OpenAIEmbedding

emb_model = OpenAIEmbedding(
    api_key="DUMMY_API_KEY",
    api_base="http://192.168.0.1:8000",
    model_name="intfloat/multilingual-e5-large",
)
emb = emb_model.get_text_embedding("hello world")
print(emb)

But if I use the arg model instead of model_name, It will raise:

ValueError: 'intfloat/multilingual-e5-large' is not a valid OpenAIEmbeddingModelType

I checked the code in llama_index/embeddings/openai/base.py,

class OpenAIEmbedding(BaseEmbedding):
    ...
    def __init__(
        ...
        self._query_engine = get_engine(mode, model, _QUERY_MODE_MODEL_DICT)
        self._text_engine = get_engine(mode, model, _TEXT_MODE_MODEL_DICT)

        if "model_name" in kwargs:
            model_name = kwargs.pop("model_name")
            self._query_engine = self._text_engine = model_name
        else:
            model_name = model

So I'm not sure if it is a good idea to set the model_name arg to use my custom OpenAI-compatible API? If not, is there a better way? If yes, maybe we need to add some documents?

Greetings!

Did you find the way to solve this issue?

Thank you in advance!

1 reply

juanluisrosaramos May 6, 2024

But this model is in hugginface no? if yes you need to load it with
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Searcherr · 2024-05-06T17:46:01Z

Searcherr
May 6, 2024

But this model is in hugginface no? if yes you need to load it with from llama_index.embeddings.huggingface import HuggingFaceEmbedding
Thank you so much!

What if I created OpenAI-like web application with huggingface models?
So I want to call this API with, for example OpenAIEmbedding class (or similar). But change base_url to my endpoint and model_name to one of the huggingface models?

Thank you in advance!

0 replies

david1542 · 2024-06-29T08:57:17Z

david1542
Jun 29, 2024

Did anyone find a solution to this problem?

2 replies

Searcherr Jun 29, 2024

Some kind of yes. Using OllamaEmbedding object instead:
https://docs.llamaindex.ai/en/stable/examples/embeddings/ollama_embedding/

vikrantdeshpande09876 Nov 6, 2024

You might wanna check out TextEmbed.
It seems to support OpenAI-compatible Embedding API format.
However, I noticed the source code needed a base-URL change from /embedding to /embeddings.
Ended up recreating TextEmbed as my own custom Embedding class.

WinPooh32 · 2025-03-03T16:51:50Z

WinPooh32
Mar 3, 2025

I have found solution for OpenAI-compatible APIs. In my case it's LiteLLM as frontend and ollama as a backend api. Solution is just using langchain openai clients:

from llama_index.embeddings.langchain import LangchainEmbedding
from llama_index.llms.langchain import LangChainLLM
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

embed_model = LangchainEmbedding(
    langchain_embeddings=OpenAIEmbeddings(
        model="snowflake-arctic-embed2:latest",
        dimensions=1024,
        base_url="http://127.0.0.1:4000",
        api_key="key",
        timeout=500,
    ),
)

llm = LangChainLLM(
    llm=ChatOpenAI(
        model="qwen2.5-coder:32b-instruct-q4",
        base_url="http://127.0.0.1:4000",
        api_key="key",
        disable_streaming="tool_calling",
    )
)

Libs to install:

pip install llama-index-embeddings-langchain
pip install llama-index-llms-langchain
pip install langchain_openai
pip install langchain
pip install langchain_community

0 replies

logan-markewich · 2025-03-04T16:40:05Z

logan-markewich
Mar 4, 2025
Maintainer

@BeautyyuYanli @WinPooh32 @david1542 @juanluisrosaramos @vikrantdeshpande09876

Here's the non-langchain method for using openai-like API's

pip install llama-index-llms-openai-like llama-index-embeddings-openai

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai_like import OpenAILike

embed_model = OpenAIEmbedding(
  model="some model", # use `model` instead of `model_name` -- janky I know
  api_base="...",
  api_key="fake",
  embed_batch_size=10,
)

llm = OpenAILike(
  model_name="my model",
  api_key="fake",
  api_base="...",
  # context window should match whatever llm you are using
  context_window=32000,  
  # specifies whether or not to use chat completions endpoint
  is_chat_model=True,
  # specifies if function calling over the API is supported
  is_function_calling_model=False,
)

4 replies

WinPooh32 Mar 5, 2025

Ollama doesn't support streaming when tools are called. That's why LiteLLM fails to get response from Ollama, when I use OpenAILike. Also there are no options to disable streaming at this client. In my case langchain clients work perfectly.

Error from litellm:

litellm-1  | 10:00:06 - LiteLLM:ERROR: ollama_chat.py:508 - LiteLLM.ollama(): Exception occured - sequence item 18: expected str instance, NoneType found
litellm-1  | Traceback (most recent call last):
litellm-1  |   File "/usr/lib/python3.13/site-packages/litellm/llms/ollama_chat.py", line 482, in ollama_async_streaming
litellm-1  |     response_content = first_chunk_content + "".join(content_chunks)

My confing:

llm = OpenAILike(
    model="qwen2.5-coder:32b-instruct-q4",
    api_key="somekey",
    api_base="http://127.0.0.1:4000/v1",
    # context window should match whatever llm you are using
    context_window=8191,
    # specifies whether or not to use chat completions endpoint
    is_chat_model=True,
    # specifies if function calling over the API is supported
    is_function_calling_model=True,
    timeout=500,
)

WinPooh32 Mar 5, 2025

I have connected Ollama to LiteLLM using OpenAI-api and problem with OpenAILike is gone.
BerriAI/litellm#7094 (comment)

WinPooh32 Mar 5, 2025

But it's impossible to set non-openai model when OpenAIEmbedding client is used due to enum:

    key = (OpenAIEmbeddingMode(mode), OpenAIEmbeddingModelType(model))
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/enum.py", line 757, in __call__
    return cls.__new__(cls, value)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/enum.py", line 1171, in __new__
    raise ve_exc
ValueError: 'nomic-embed-text:v1.5' is not a valid OpenAIEmbeddingModelType

My code:

from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    model="nomic-embed-text:v1.5",
    dimensions=768, 
    embedding_ctx_length=8191,
    api_base="http://127.0.0.1:4000",
    api_key="somekey",
)

logan-markewich Mar 7, 2025
Maintainer

@BeautyyuYanli sounds like an issue for litellm then? Our official Ollama llm class works fine with streaming tool calls 😁

And my bad for the openai embeddings, you should use model_name= not model=

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to use an OpenAI-compatible embedding API #11809

{{title}}

Replies: 6 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

About Dosu

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Best way to use an OpenAI-compatible embedding API #11809

Replies: 6 comments · 8 replies

dosubot[bot] bot Mar 11, 2024

Sources

About Dosu

BeautyyuYanli Mar 11, 2024 Author

logan-markewich Mar 4, 2025 Maintainer

logan-markewich Mar 7, 2025 Maintainer

Replies: 6 comments 8 replies

dosubot[bot]
bot Mar 11, 2024

BeautyyuYanli Mar 11, 2024
Author

logan-markewich
Mar 4, 2025
Maintainer

logan-markewich Mar 7, 2025
Maintainer