New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the possibility to use offline models (maybe via ollama) #4424
Comments
This would be awesome. this project seems like it could serve as inspiration for the feature |
It should be easy to add support to any service that provides an OpenAI-compatible API, such as Perplexity, or LiteLLM for local models. |
ollama is now OpenAI-compatible as well: https://ollama.ai/blog/openai-compatibility |
a quantized version of CodeLlama would work well locally on macs: https://huggingface.co/TheBloke/CodeLlama-34B-GGUF ollama has a way of interacting with a quantized CodeLlama, but up to the Zed team whether they'd rather use ollama or run llama.cpp within Zed (ollama runs llama.cpp under the hood) IMO this should be more generic than "offline vs. online", and more about giving users choice in which CoPilot model they'd like to use. There's a balance for sure! |
A low effort approach to include this feature is allowing the configuration of a custom proxy for copilot. "github.copilot.advanced": {
"debug.testOverrideProxyUrl": "http://localhost:5001",
"debug.overrideProxyUrl": "http://localhost:5001"
} It would be really cool to have this tweak available in Zed too |
This is how, it's working for me I couldn't add my custom model to the "default_open_ai_model" setting, For now, Zed is allowing only OpenAI models ("gpt-3.5-turbo-0613","gpt-4-0613","gpt-4-1106-preview"). So I had to clone it to proxy it. Pull and run Mistral model from Ollama library
Added this to my Zed Settings (
Restart Zed |
@sumanmichael Thank you for the tip. Can you confirm that this only works for "Assistant Panel" (chat) and "Inline Assist"? Is there a way to bypass Zed login requirement to use Copilot? |
Can this also be made to work with any local server running an API similar to OpenAI's API? Specifically, I'm interested in using LM Studio. |
just integrate continue.dev please this will exponentially increase adoption as continue dev solves all of llm worries and works with all possible providers both local and cloud |
Same here, I'm using LiteLLM which presents an OpenAI-compatible API, and integrates with a bunch of model loaders on the back end (ollama, tgi, etc.). Would be nice to be able to just set a URL and token and have it use my server. |
Yeah, Continue is very flexible. This is what the Continue config looks like: "models": [
{
"title": "mixtral",
"provider": "openai",
"model": "mixtral:8x7b",
"apiBase": "https://skynet.becomes.self.aware.io:444",
"apiKey": "sk-somethingsomething"
}
] I suppose if Zed only supports the OpenAI API it wouldn't need |
So I suppose a couple of QoL improvements can be made here, assuming all LLM below are compliant with OpenAI's specs:
|
As a starting point, even the ability to configure one model hosted locally or on my server would be great. |
Also, since OpenAI is so proprietary, I do not really feel comfortable with the idea that all these open source/weight models are copying the OpenAI API spec. It would not surprise me if in the future, an open standard is created rather than relying on OpenAI to set the standard. I'm not saying we should decide on an open spec right here right now, but just wanted to point this out, and emphasis a need for simplicity. |
The OpenAI API spec is already the norm for many libraries. I would propose to keep it simple and allow custom models names if the default api is anything other than openai |
I have tried this solution with Mistral running locally with ollama. It doesn't work for me. Did anybody else actually make this work? |
Works for
Works for me as described using codellama:7b-instruct or mistral. Did just a short test repo for testing. |
How would zed know what a model's token limit is? Also, as a side note, some models use different tokenizers. Some well known ones are BPE, SentencePiece, and CodeGen. Counting tokens using the wrong tokenizer would produce inaccurate counts. |
@janerikmai |
If you want to use another model available on Hugging Face that's not native to Ollama (DeepSeek-Coder, WizardCoder etc), when creating from GGUF you can explicitly name it to be compatible with Zed whilst reading from the Modelfile.
|
After #8646 to make local LLM work you need to add this to Zed Settings (~/.config/zed/settings.json) "assistant": {
"provider": {
"type": "openai",
"api_url": "http://localhost:11434/v1"
}
}
At least it works for me |
How is this working with the openAI calls for ada embeddings? or is that just dysfunctional?
impl OpenAiEmbeddingProvider {
pub async fn new(client: Arc<dyn HttpClient>, executor: BackgroundExecutor) -> Self {
let (rate_limit_count_tx, rate_limit_count_rx) = watch::channel_with(None);
let rate_limit_count_tx = Arc::new(Mutex::new(rate_limit_count_tx));
// Loading the model is expensive, so ensure this runs off the main thread.
let model = executor
.spawn(async move { OpenAiLanguageModel::load("text-embedding-ada-002") })
.await;
let credential = Arc::new(RwLock::new(ProviderCredential::NoCredentials));
OpenAiEmbeddingProvider {
model,
credential,
client,
executor,
rate_limit_count_rx,
rate_limit_count_tx,
}
}
// ... additional code
} |
For what it's worth this is what I needed to do to make it work locally: "assistant": {
"version": "1",
"provider": {
"name": "openai",
"api_url": "http://localhost:11434/v1"
}
} Then pick a model: |
I tried this and I get a prompt to enter an OpenAI API key. I seem blocked even though I have the assistant config mentioned earlier. If I enter a junk key it doesn't help then I get errors about not being able to connect to OpenAI. So the settings aren't working apparently? my $HOME/.config/zed/settings.json
After I add a valid OpenAI API key things seem to work. I choose gpt-4-turbo which I setup with ollama: With that I try a prompt in a file and it seems my local LLM is too slow, zed says: "request or operation took longer than the configured timeout time". I don't get any auto-complete bits in assistant or provider json entries for time... I see this timeout is a known issue: #9913 |
Edit: maybe I just lost the API key after a restart 😅 |
related issues: zed-industries#9913 # assistant timeout zed-industries#4424 # add the possibility to use offline models (maybe via ollama) I am using ollama with mistral model. my local settings.json: { "theme": "One Dark", "ui_font_size": 16, "buffer_font_size": 16, "assistant": { "version": "1", "provider": { "name": "openai", "api_url": "http://localhost:11434/v1" } } }
Here is a complete rundown of how I got it to work after collecting all the pieces of information in this thread:
I hope this helps! |
I wish Zed could provide an easy way to point to a server API like the Continue extension. Here is an example in my config file in VSCodium.
LM Studio supports multiple endpoints for different context:
I followed @sumanmichael (Thank you so much) steps to make my local Mistral work, but while the chat box works flawlessly, the code completion is still messy compared to the same exact model running through Continue in VSCodium. Being able to manually set an API endpoint instead of letting Zed concatenating '/completion' to an alleged openai server HTTP address would allow any type of installation. |
fwiw, I managed to get Continue to generate this very useful config fragment for using Ollama: "models": [
{
"model": "AUTODETECT",
"title": "Ollama",
"completionOptions": {},
"apiBase": "http://localhost:11434",
"provider": "ollama"
}
], You need to restart VSCode when you add a model to Ollama, but at least you don't need to add another config fragment... very nice. |
Check for existing issues
Describe the feature
Hi,
Having the possibility to use other models for example llama (most likely via ollama) would be really amazing instead of being forced to use the proprietary and unethical ChatGPT.
Here's a link to their API docs: https://github.com/jmorganca/ollama/blob/main/docs/api.md
Since an API is also used for ChatGPT it shouldn't be too much work
If applicable, add mockups / screenshots to help present your vision of the feature
No response
The text was updated successfully, but these errors were encountered: