Releases: weicj/vLLM-2080Ti-Definitive
Releases · weicj/vLLM-2080Ti-Definitive
v0.1.6
- Adds explicit W8A8 checkpoint support documentation for the Quark INT8 route,
including the testednameistoken/Qwen3.6-27B-Quark-W8A8-INT8checkpoint. - Updates the launcher display to separate the real vLLM
--quantization
value from the display-only W/A type (W4A16,W8A16,W8A8). - Adds launcher service-status cache reporting for running services. Live
usedvalues and 30-second refresh are shown only when vLLM exposes real
cache-usage metrics; otherwise the launcher reports total cache capacity only. - Improves launcher startup preflight for large single-file checkpoints and
cleans up residual vLLM processes after failed launches. - Fixes launcher stop handling so orphaned vLLM API servers, worker processes,
and resource trackers are discovered and cleaned up instead of leaving VRAM
occupied.
v0.1.5 - Launcher Profiles
This release updates the public runtime package around the new launcher and profile layout.
Highlights:
- Renames the public service manager to
launcher.shand keepsbuild.shas the one-click source build entry point. - Updates launcher modes to
safe,normal, andfast, with Qwen3.6 profiles organized by model, mode, and weight precision. - Adds chat template presets and service-level thinking budget defaults while keeping global runtime controls out of route profile files.
- Refreshes Qwen3.6 profile documentation and restores the KV throughput sweep SVG charts.
Validation:
bash -n build.sh launcher.sh tools/validate_profiles.sh tools/evaluate_fast_modes.shbash tools/validate_profiles.sh(profile_validation_ok total=11)- Python compile checks for touched runtime files
- launcher dry-runs for
safeandfastgraph policy behavior
v0.1.4 - Slim Runtime Source Release
vLLM 2080 Ti Definitive Edition v0.1.4
This release trims the public source tree to the focused SM75 runtime: source tree, one-click build entry point, interactive launcher, validated profiles, and project documentation.
Changes:
- Slims the public repository to the focused SM75 runtime source tree, launcher scripts, validated profiles, and project documentation.
- Adds the interactive start.sh service manager and one-click build.sh source build entry point.
- Keeps Docker artifacts out of this source release; Docker packaging remains a separate future deployment layer.
- Keeps active profiles in profiles/ and experimental snippets under profiles/experimental/.
- Keeps docs/model-profile-routes.md limited to tested stable FP16 routes; speed-mode and quantized-KV routes will be added after fresh validation.
- Restores the Qwen/Gemma feature matrix to the v0.1.3 capability-view wording while preserving the 0.1.4 naming cleanup: TurboQuant KV and 256K/512K labels.
- Carries forward the v0.1.3 graph-safety runtime fixes while removing upstream CI/docs/test bulk from the public source tree.
v0.1.3 - MTP Graph Safety
v0.1.3 - MTP Graph Safety
- Adds graph-safety handling for Native MTP + hybrid Mamba/GDN models.
- Production profiles now fall back from full decode CUDA Graph replay to PIECEWISE/NONE for this risky combination.
- Keeps the old peak-throughput path available for explicit benchmark profiles via
VLLM_ALLOW_MAMBA_SPEC_FULL_CUDAGRAPH=1. - Updates
VERSION,CHANGELOG.md, and README release markers tov0.1.3.