local-llm-vllm-ui

Local WSL2 + vLLM setup with a simple FastAPI web UI for model selection, chat, and per-session markdown logs.

Windows-side prerequisites

Windows 11 (or Windows 10 22H2) with WSL2 enabled
Latest NVIDIA Windows GPU drivers with WSL support
VS Code + "Remote - WSL" extension

WSL steps

Open PowerShell and install Ubuntu (if you have not yet):

wsl --install -d Ubuntu

Launch Ubuntu and let it finish first-run setup.
From Windows, open this folder in VS Code, then use "Remote - WSL: Reopen Folder in WSL".

GPU check

Inside WSL:

nvidia-smi

You should see your RTX 4070 listed. If not, update NVIDIA drivers on Windows and confirm WSL2 GPU support is enabled.

Setup

From the repo root (inside WSL):

cp .env.example .env
./scripts/setup_wsl_ubuntu.sh
./scripts/setup_webui.sh

Note: zstd is installed by setup_wsl_ubuntu.sh because Ollama uses it for model downloads.

Optional: Ollama setup (for GGUF models)

If you want to use GGUF models with the Ollama backend:

Install Ollama for Linux:

curl -fsSL https://ollama.com/install.sh | sh

Start the service:

ollama serve

Pull the GGUF model you want:

ollama pull QuantFactory/Qwen2.5-7B-Instruct-abliterated-v2-GGUF

Alternatively, run the helper script:

./scripts/setup_ollama.sh

Start the web UI

./scripts/run_webui.sh

Open your browser:

http://localhost:5000

Load a model

Pick a model from the dropdown.
Click "Load Model". The UI will stop any running vLLM server and start a new one with the selected model.
Wait for the status to show "Running".
If the selected model uses the Ollama backend, make sure ollama serve is running.

Note: The web UI will attempt to start Ollama and pull the model automatically when you load an Ollama-backed model.

How logs work

Each new chat creates a new markdown file in logs/.
Filename format: YYYY-MM-DD_HHMMSS_<modelId>.md
Each user/assistant message is appended immediately.

Troubleshooting

vLLM not running: click "Load Model" or check ./.run/vllm.log.
OOM errors: choose a smaller model or the 4-bit DeepSeek option.
vLLM health check fails: confirm VLLM_HOST in .env and that the server started.
WSL GPU missing: run nvidia-smi in WSL and verify Windows driver install.

Notes

vLLM runs at http://localhost:8000 (OpenAI-compatible).
Ollama runs at http://localhost:11434 (GGUF backend).
Web UI runs at http://localhost:5000.
Model list is defined in config/models.json.

Make scripts executable (WSL)

chmod +x scripts/*.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

local-llm-vllm-ui

Windows-side prerequisites

WSL steps

GPU check

Setup

Optional: Ollama setup (for GGUF models)

Start the web UI

Load a model

How logs work

Troubleshooting

Notes

Make scripts executable (WSL)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
logs		logs
scripts		scripts
webui		webui
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md

vmiss33/model-workbench

Folders and files

Latest commit

History

Repository files navigation

local-llm-vllm-ui

Windows-side prerequisites

WSL steps

GPU check

Setup

Optional: Ollama setup (for GGUF models)

Start the web UI

Load a model

How logs work

Troubleshooting

Notes

Make scripts executable (WSL)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages