A clean, practical configuration for using Continue (VS Code) with local AI models via Ollama — fully offline, no API keys, no paid tools.
GitHub: https://github.com/xoxxel
This project provides a structured configuration for building your own local AI coding assistant inside VS Code using Continue. Instead of relying on paid services like Copilot or Cursor, this setup allows you to:
- Run models locally (privacy-first)
- Customize behavior (chat, edit, autocomplete, etc.)
- Optimize performance based on your system
- Build your own lightweight or advanced AI agent
Each model is assigned a role:
- chat → conversation & reasoning
- autocomplete → fast code completion
- edit → modify code
- apply → apply changes to files
- rerank → improve search/context accuracy
- embed → search & indexing
You can assign different models to each role depending on your system power.
- VS Code
- Continue Extension
- Ollama installed
- Minimum 8GB RAM (16GB+ recommended)
ollama pull qwen2.5-coder:1.5b-base
ollama pull qwen2.5:3b
ollama pull llama3.1:8b
ollama pull deepseek-coder-v2:16b
ollama pull nomic-embed-textBelow are recommended models for each role based on system capability.
| System | Recommended Models | Notes |
|---|---|---|
| 8GB RAM | qwen2.5:3b | Fast, lightweight |
| 16GB RAM | qwen2.5:3b / llama3.2:3b | Balanced |
| 32GB+ RAM | llama3.1:8b | Best quality |
| System | Recommended Models | Notes |
|---|---|---|
| 8GB RAM | qwen2.5-coder:1.5b-base ✅ | Best choice |
| 16GB RAM | qwen2.5-coder:1.5b-base | Still optimal |
| 32GB+ RAM | qwen2.5-coder:1.5b-base | No need heavier |
| System | Recommended Models | Notes |
|---|---|---|
| 8GB RAM | qwen2.5-coder:1.5b-base | Only light edits |
| 16GB RAM | qwen2.5-coder / llama3.1:8b | Medium tasks |
| 32GB+ RAM | deepseek-coder-v2:16b | Best for large edits |
| System | Recommended Models | Notes |
|---|---|---|
| 8GB RAM | qwen2.5-coder:1.5b-base | Fast & simple |
| 16GB RAM | qwen2.5-coder:1.5b-base | Stable |
| 32GB+ RAM | deepseek-coder-v2:16b | Safer for complex changes |
| System | Recommended Models | Notes |
|---|---|---|
| 8GB RAM | ❌ Skip | Not necessary |
| 16GB RAM | llama3.1:8b | Acceptable |
| 32GB+ RAM | deepseek-coder-v2:16b | Best accuracy |
| System | Recommended Models | Notes |
|---|---|---|
| All Systems | nomic-embed-text ✅ | Best option |
- chat → qwen2.5:3b
- autocomplete → qwen2.5-coder
- edit/apply → qwen2.5-coder
- embed → nomic
- rerank → disabled
- chat → qwen2.5:3b
- autocomplete → qwen coder
- edit → qwen coder / llama3
- apply → qwen coder
- rerank → llama3.1:8b
- embed → nomic
- chat → llama3.1:8b
- autocomplete → qwen coder
- edit/apply → deepseek-coder-v2:16b
- rerank → deepseek
- embed → nomic
- Use small models for frequent tasks (autocomplete)
- Use large models only when needed (edit / rerank)
- Avoid running too many heavy models simultaneously
- Prefer SSD over HDD
- Keep your config minimal and focused
- Large models (16B) may be slow on CPU-only systems
- First run may take time (model loading)
- Local setup = zero cost + full privacy
Feel free to improve this config and submit a pull request.
If this project helps you, give it a star ⭐