Skip to content

vmiss33/model-workbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

local-llm-vllm-ui

Local WSL2 + vLLM setup with a simple FastAPI web UI for model selection, chat, and per-session markdown logs.

Windows-side prerequisites

  • Windows 11 (or Windows 10 22H2) with WSL2 enabled
  • Latest NVIDIA Windows GPU drivers with WSL support
  • VS Code + "Remote - WSL" extension

WSL steps

  1. Open PowerShell and install Ubuntu (if you have not yet):
wsl --install -d Ubuntu
  1. Launch Ubuntu and let it finish first-run setup.
  2. From Windows, open this folder in VS Code, then use "Remote - WSL: Reopen Folder in WSL".

GPU check

Inside WSL:

nvidia-smi

You should see your RTX 4070 listed. If not, update NVIDIA drivers on Windows and confirm WSL2 GPU support is enabled.

Setup

From the repo root (inside WSL):

cp .env.example .env
./scripts/setup_wsl_ubuntu.sh
./scripts/setup_webui.sh

Note: zstd is installed by setup_wsl_ubuntu.sh because Ollama uses it for model downloads.

Optional: Ollama setup (for GGUF models)

If you want to use GGUF models with the Ollama backend:

  1. Install Ollama for Linux:
curl -fsSL https://ollama.com/install.sh | sh
  1. Start the service:
ollama serve
  1. Pull the GGUF model you want:
ollama pull QuantFactory/Qwen2.5-7B-Instruct-abliterated-v2-GGUF

Alternatively, run the helper script:

./scripts/setup_ollama.sh

Start the web UI

./scripts/run_webui.sh

Open your browser:

Load a model

  1. Pick a model from the dropdown.
  2. Click "Load Model". The UI will stop any running vLLM server and start a new one with the selected model.
  3. Wait for the status to show "Running".
  4. If the selected model uses the Ollama backend, make sure ollama serve is running.

Note: The web UI will attempt to start Ollama and pull the model automatically when you load an Ollama-backed model.

How logs work

  • Each new chat creates a new markdown file in logs/.
  • Filename format: YYYY-MM-DD_HHMMSS_<modelId>.md
  • Each user/assistant message is appended immediately.

Troubleshooting

  • vLLM not running: click "Load Model" or check ./.run/vllm.log.
  • OOM errors: choose a smaller model or the 4-bit DeepSeek option.
  • vLLM health check fails: confirm VLLM_HOST in .env and that the server started.
  • WSL GPU missing: run nvidia-smi in WSL and verify Windows driver install.

Notes

Make scripts executable (WSL)

chmod +x scripts/*.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors