Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the possibility to use offline models (maybe via ollama) #4424

Open
1 task done
gawlk opened this issue Oct 14, 2023 · 28 comments
Open
1 task done

Add the possibility to use offline models (maybe via ollama) #4424

gawlk opened this issue Oct 14, 2023 · 28 comments
Labels
ai Improvement related to Assistant, Copilot, or other AI features enhancement [core label]

Comments

@gawlk
Copy link

gawlk commented Oct 14, 2023

Check for existing issues

  • Completed

Describe the feature

Hi,

Having the possibility to use other models for example llama (most likely via ollama) would be really amazing instead of being forced to use the proprietary and unethical ChatGPT.

Here's a link to their API docs: https://github.com/jmorganca/ollama/blob/main/docs/api.md

Since an API is also used for ChatGPT it shouldn't be too much work

If applicable, add mockups / screenshots to help present your vision of the feature

No response

@gawlk gawlk added admin read Pending admin review enhancement [core label] triage Maintainer needs to classify the issue labels Oct 14, 2023
@JosephTLyons JosephTLyons added ai Improvement related to Assistant, Copilot, or other AI features and removed triage Maintainer needs to classify the issue admin read Pending admin review labels Oct 16, 2023
@JosephTLyons JosephTLyons transferred this issue from zed-industries/community Jan 24, 2024
@willtejeda
Copy link

This would be awesome.

this project seems like it could serve as inspiration for the feature

https://github.com/continuedev/continue

@tjohnman
Copy link

tjohnman commented Feb 8, 2024

It should be easy to add support to any service that provides an OpenAI-compatible API, such as Perplexity, or LiteLLM for local models.

@octoth0rpe
Copy link

ollama is now OpenAI-compatible as well: https://ollama.ai/blog/openai-compatibility

@shaqq
Copy link

shaqq commented Feb 14, 2024

a quantized version of CodeLlama would work well locally on macs:

https://huggingface.co/TheBloke/CodeLlama-34B-GGUF

ollama has a way of interacting with a quantized CodeLlama, but up to the Zed team whether they'd rather use ollama or run llama.cpp within Zed (ollama runs llama.cpp under the hood)

IMO this should be more generic than "offline vs. online", and more about giving users choice in which CoPilot model they'd like to use. There's a balance for sure!

@Belluxx
Copy link

Belluxx commented Feb 19, 2024

A low effort approach to include this feature is allowing the configuration of a custom proxy for copilot.
It is already possible today in VSCode's Copilot extension using this config:

"github.copilot.advanced": {
    "debug.testOverrideProxyUrl": "http://localhost:5001",
    "debug.overrideProxyUrl": "http://localhost:5001"
}

It would be really cool to have this tweak available in Zed too

@sumanmichael
Copy link

ollama is now OpenAI-compatible as well: https://ollama.ai/blog/openai-compatibility

This is how, it's working for me

I couldn't add my custom model to the "default_open_ai_model" setting, For now, Zed is allowing only OpenAI models ("gpt-3.5-turbo-0613","gpt-4-0613","gpt-4-1106-preview"). So I had to clone it to proxy it.

Pull and run Mistral model from Ollama library
Clone the Mistral model as gpt-4-1106-preview

ollama run mistral
ollama cp mistral gpt-4-1106-preview

Added this to my Zed Settings (~/.config/zed/settings.json)

"assistant": {
    "openai_api_url": "http://localhost:11434/v1"
 }

Restart Zed

@Belluxx Belluxx mentioned this issue Feb 22, 2024
1 task
@Belluxx
Copy link

Belluxx commented Feb 23, 2024

@sumanmichael Thank you for the tip. Can you confirm that this only works for "Assistant Panel" (chat) and "Inline Assist"?
I tried to use it for copilot (theoretically supported since copilot uses OpenAI-APIs) but with no success because i don't have a Microsoft/Copilot account.

Is there a way to bypass Zed login requirement to use Copilot?

@taylorgoolsby
Copy link

Can this also be made to work with any local server running an API similar to OpenAI's API? Specifically, I'm interested in using LM Studio.

@JayGhiya
Copy link

just integrate continue.dev please this will exponentially increase adoption as continue dev solves all of llm worries and works with all possible providers both local and cloud

@oxaronick
Copy link

Can this also be made to work with any local server running an API similar to OpenAI's API? Specifically, I'm interested in using LM Studio.

Same here, I'm using LiteLLM which presents an OpenAI-compatible API, and integrates with a bunch of model loaders on the back end (ollama, tgi, etc.). Would be nice to be able to just set a URL and token and have it use my server.

@oxaronick
Copy link

just integrate continue.dev please this will exponentially increase adoption as continue dev solves all of llm worries and works with all possible providers both local and cloud

Yeah, Continue is very flexible. This is what the Continue config looks like:

  "models": [
    {
      "title": "mixtral",
      "provider": "openai",
      "model": "mixtral:8x7b",
      "apiBase": "https://skynet.becomes.self.aware.io:444",
      "apiKey": "sk-somethingsomething"
    }
  ]

I suppose if Zed only supports the OpenAI API it wouldn't need provider, but otherwise that's a nice way to let users configure their server.

@jianghoy
Copy link

just integrate continue.dev please this will exponentially increase adoption as continue dev solves all of llm worries and works with all possible providers both local and cloud

Yeah, Continue is very flexible. This is what the Continue config looks like:

  "models": [
    {
      "title": "mixtral",
      "provider": "openai",
      "model": "mixtral:8x7b",
      "apiBase": "https://skynet.becomes.self.aware.io:444",
      "apiKey": "sk-somethingsomething"
    }
  ]

I suppose if Zed only supports the OpenAI API it wouldn't need provider, but otherwise that's a nice way to let users configure their server.

So I suppose a couple of QoL improvements can be made here, assuming all LLM below are compliant with OpenAI's specs:

  • Zed should be able to store multiple LLM configs, following the schema used by continue.dev, which, IMHO looks pretty robust.
  • Zed should be able to toggle between multiple LLM providers using configs provided by users.

@oxaronick
Copy link

Zed should be able to toggle between multiple LLM providers using configs provided by users.

As a starting point, even the ability to configure one model hosted locally or on my server would be great.

@taylorgoolsby
Copy link

Also, since OpenAI is so proprietary, I do not really feel comfortable with the idea that all these open source/weight models are copying the OpenAI API spec. It would not surprise me if in the future, an open standard is created rather than relying on OpenAI to set the standard. I'm not saying we should decide on an open spec right here right now, but just wanted to point this out, and emphasis a need for simplicity.

@franz101
Copy link

Also, since OpenAI is so proprietary, I do not really feel comfortable with the idea that all these open source/weight models are copying the OpenAI API spec. It would not surprise me if in the future, an open standard is created rather than relying on OpenAI to set the standard. I'm not saying we should decide on an open spec right here right now, but just wanted to point this out, and emphasis a need for simplicity.

The OpenAI API spec is already the norm for many libraries.
Ollama has a PR to list model endpoint soon:
ollama/ollama#2476

I would propose to keep it simple and allow custom models names if the default api is anything other than openai

@thosky
Copy link

thosky commented Mar 4, 2024

ollama is now OpenAI-compatible as well: https://ollama.ai/blog/openai-compatibility

This is how, it's working for me

I couldn't add my custom model to the "default_open_ai_model" setting, For now, Zed is allowing only OpenAI models ("gpt-3.5-turbo-0613","gpt-4-0613","gpt-4-1106-preview"). So I had to clone it to proxy it.

Pull and run Mistral model from Ollama library Clone the Mistral model as gpt-4-1106-preview

ollama run mistral
ollama cp mistral gpt-4-1106-preview

Added this to my Zed Settings (~/.config/zed/settings.json)

"assistant": {
    "openai_api_url": "http://localhost:11434/v1"
 }

Restart Zed

I have tried this solution with Mistral running locally with ollama. It doesn't work for me. Did anybody else actually make this work?

@janerikmai
Copy link

janerikmai commented Mar 4, 2024

Works for

ollama is now OpenAI-compatible as well: https://ollama.ai/blog/openai-compatibility

This is how, it's working for me
I couldn't add my custom model to the "default_open_ai_model" setting, For now, Zed is allowing only OpenAI models ("gpt-3.5-turbo-0613","gpt-4-0613","gpt-4-1106-preview"). So I had to clone it to proxy it.
Pull and run Mistral model from Ollama library Clone the Mistral model as gpt-4-1106-preview

ollama run mistral
ollama cp mistral gpt-4-1106-preview

Added this to my Zed Settings (~/.config/zed/settings.json)

"assistant": {
    "openai_api_url": "http://localhost:11434/v1"
 }

Restart Zed

I have tried this solution with Mistral running locally with ollama. It doesn't work for me. Did anybody else actually make this work?

Works for me as described using codellama:7b-instruct or mistral. Did just a short test repo for testing.
Screenshot 2024-03-04 at 20 42 39
Screenshot 2024-03-04 at 20 43 48

@taylorgoolsby
Copy link

taylorgoolsby commented Mar 5, 2024

How would zed know what a model's token limit is?

Also, as a side note, some models use different tokenizers. Some well known ones are BPE, SentencePiece, and CodeGen. Counting tokens using the wrong tokenizer would produce inaccurate counts.

@thosky
Copy link

thosky commented Mar 6, 2024

@janerikmai
Thanks for confirming it's working. It's working for me as well now after reinstalling Ollama. The old version might not have supported the open api.

@james-haddock
Copy link

james-haddock commented Mar 13, 2024

If you want to use another model available on Hugging Face that's not native to Ollama (DeepSeek-Coder, WizardCoder etc), when creating from GGUF you can explicitly name it to be compatible with Zed whilst reading from the Modelfile.

ollama create gpt-4-1106-preview -f Modelfile

@Krukov
Copy link

Krukov commented Mar 15, 2024

After #8646 to make local LLM work you need to add this to Zed Settings (~/.config/zed/settings.json)

  "assistant": {
    "provider": {
      "type": "openai",
      "api_url": "http://localhost:11434/v1"
    }
  }

At least it works for me

@versecafe
Copy link

How is this working with the openAI calls for ada embeddings? or is that just dysfunctional?

crates/ai/src/providers/open_ai/embeddings.rs

impl OpenAiEmbeddingProvider {
    pub async fn new(client: Arc<dyn HttpClient>, executor: BackgroundExecutor) -> Self {
        let (rate_limit_count_tx, rate_limit_count_rx) = watch::channel_with(None);
        let rate_limit_count_tx = Arc::new(Mutex::new(rate_limit_count_tx));

        // Loading the model is expensive, so ensure this runs off the main thread.
        let model = executor
            .spawn(async move { OpenAiLanguageModel::load("text-embedding-ada-002") })
            .await;
        let credential = Arc::new(RwLock::new(ProviderCredential::NoCredentials));

        OpenAiEmbeddingProvider {
            model,
            credential,
            client,
            executor,
            rate_limit_count_rx,
            rate_limit_count_tx,
        }
    }
    // ... additional code
 }

@andreicek
Copy link

For what it's worth this is what I needed to do to make it work locally:

  "assistant": {
    "version": "1",
    "provider": {
      "name": "openai",
      "api_url": "http://localhost:11434/v1"
    }
  }

Then pick a model: ollama run mistral and then ollama cp mistral gpt-4-turbo-preview. Restart zed and enjoy.

@craigcomstock
Copy link

craigcomstock commented Mar 28, 2024

I tried this and I get a prompt to enter an OpenAI API key. I seem blocked even though I have the assistant config mentioned earlier.

If I enter a junk key it doesn't help then I get errors about not being able to connect to OpenAI. So the settings aren't working apparently?

my $HOME/.config/zed/settings.json
zed version: Zed 0.128.3

{
  "ui_font_size": 16,
  "buffer_font_size": 16,
  "assistant": {
    "version": "1",
    "provider": {
      "name": "openai",
      "api_url": "http://localhost:11434/v1"
    }
  }
}

After I add a valid OpenAI API key things seem to work. I choose gpt-4-turbo which I setup with ollama: ollama cp mistral gpt-4-turbo-preview.

With that I try a prompt in a file and it seems my local LLM is too slow, zed says: "request or operation took longer than the configured timeout time". I don't get any auto-complete bits in assistant or provider json entries for time...

I see this timeout is a known issue: #9913

@duggan
Copy link

duggan commented Mar 28, 2024

Looks like there are some new required settings, this does the trick for me:

  "assistant": {
    "version": "1",
    "provider": {
      "name": "openai",
      "type": "openai",
      "default_model": "gpt-4-turbo-preview",
      "api_url": "http://localhost:11434/v1"
    }
  }
  

Edit: maybe I just lost the API key after a restart 😅

craigcomstock added a commit to craigcomstock/zed that referenced this issue Mar 28, 2024
related issues:
zed-industries#9913 # assistant timeout
zed-industries#4424 # add the possibility to use offline models (maybe via ollama)

I am using ollama with mistral model.

my local settings.json:

{
  "theme": "One Dark",
  "ui_font_size": 16,
  "buffer_font_size": 16,
  "assistant": {
    "version": "1",
    "provider": {
      "name": "openai",
      "api_url": "http://localhost:11434/v1"
    }
  }
}
@Moshyfawn Moshyfawn mentioned this issue Apr 4, 2024
4 tasks
@ednanf
Copy link

ednanf commented Apr 13, 2024

Here is a complete rundown of how I got it to work after collecting all the pieces of information in this thread:
Note: I run Ollama on Mac, if you run anything differently, you have to adapt it.

  1. Add the following configuration to Zed config:
"assistant": {
    "version": "1",
    "provider": {
      "name": "openai",
      "type": "openai",
      "default_model": "gpt-4-turbo-preview",
      "api_url": "http://localhost:11434/v1"
    }
  }
  1. Download Mistral via ollama cli:
ollama run mistral
  1. Copy the downloaded Mistral, changing its name:
ollama cp mistral gpt-4-turbo-preview
  1. Add the OpenAI API key to Zed (source):
ollama
  1. Restart Zed to ensure everything is working as it is supposed to.

I hope this helps!

@matthieuHenocque
Copy link

I wish Zed could provide an easy way to point to a server API like the Continue extension. Here is an example in my config file in VSCodium.

"models": [
    {
      "title": "Mistral",
      "model": "mistral-7b-instruct-v0.2-code-ft.Q5_K_M",
      "contextLength": 4096,
      "provider": "lmstudio"
    }
  ],
"tabAutocompleteModel": {
    "title": "Tab Autocomplete Model",
    "provider": "lmstudio",
    "model": "mistral-7b-instruct-v0.2-code-ft.Q5_K_M",
    "apiBase": "http://localhost:1234/v1/"
  },

LM Studio supports multiple endpoints for different context:

GET /v1/models
POST /v1/chat/completions
POST /v1/embeddings            
POST /v1/completions

I followed @sumanmichael (Thank you so much) steps to make my local Mistral work, but while the chat box works flawlessly, the code completion is still messy compared to the same exact model running through Continue in VSCodium.

Being able to manually set an API endpoint instead of letting Zed concatenating '/completion' to an alleged openai server HTTP address would allow any type of installation.

@psyv282j9d
Copy link

Yeah, Continue is very flexible. This is what the Continue config looks like:

  "models": [
    {
      "title": "mixtral",
      "provider": "openai",
      "model": "mixtral:8x7b",
      "apiBase": "https://skynet.becomes.self.aware.io:444",
      "apiKey": "sk-somethingsomething"
    }
  ]

fwiw, I managed to get Continue to generate this very useful config fragment for using Ollama:

"models": [
    {
      "model": "AUTODETECT",
      "title": "Ollama",
      "completionOptions": {},
      "apiBase": "http://localhost:11434",
      "provider": "ollama"
    }
  ],

You need to restart VSCode when you add a model to Ollama, but at least you don't need to add another config fragment... very nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai Improvement related to Assistant, Copilot, or other AI features enhancement [core label]
Projects
None yet
Development

No branches or pull requests