Skip to content

Releases: weicj/vLLM-2080Ti-Definitive

v0.1.6

09 Jun 12:47

Choose a tag to compare

  • Adds explicit W8A8 checkpoint support documentation for the Quark INT8 route,
    including the tested nameistoken/Qwen3.6-27B-Quark-W8A8-INT8 checkpoint.
  • Updates the launcher display to separate the real vLLM --quantization
    value from the display-only W/A type (W4A16, W8A16, W8A8).
  • Adds launcher service-status cache reporting for running services. Live
    used values and 30-second refresh are shown only when vLLM exposes real
    cache-usage metrics; otherwise the launcher reports total cache capacity only.
  • Improves launcher startup preflight for large single-file checkpoints and
    cleans up residual vLLM processes after failed launches.
  • Fixes launcher stop handling so orphaned vLLM API servers, worker processes,
    and resource trackers are discovered and cleaned up instead of leaving VRAM
    occupied.

v0.1.5 - Launcher Profiles

08 Jun 06:36

Choose a tag to compare

This release updates the public runtime package around the new launcher and profile layout.

Highlights:

  • Renames the public service manager to launcher.sh and keeps build.sh as the one-click source build entry point.
  • Updates launcher modes to safe, normal, and fast, with Qwen3.6 profiles organized by model, mode, and weight precision.
  • Adds chat template presets and service-level thinking budget defaults while keeping global runtime controls out of route profile files.
  • Refreshes Qwen3.6 profile documentation and restores the KV throughput sweep SVG charts.

Validation:

  • bash -n build.sh launcher.sh tools/validate_profiles.sh tools/evaluate_fast_modes.sh
  • bash tools/validate_profiles.sh (profile_validation_ok total=11)
  • Python compile checks for touched runtime files
  • launcher dry-runs for safe and fast graph policy behavior

v0.1.4 - Slim Runtime Source Release

06 Jun 05:06

Choose a tag to compare

vLLM 2080 Ti Definitive Edition v0.1.4

This release trims the public source tree to the focused SM75 runtime: source tree, one-click build entry point, interactive launcher, validated profiles, and project documentation.

Changes:

  • Slims the public repository to the focused SM75 runtime source tree, launcher scripts, validated profiles, and project documentation.
  • Adds the interactive start.sh service manager and one-click build.sh source build entry point.
  • Keeps Docker artifacts out of this source release; Docker packaging remains a separate future deployment layer.
  • Keeps active profiles in profiles/ and experimental snippets under profiles/experimental/.
  • Keeps docs/model-profile-routes.md limited to tested stable FP16 routes; speed-mode and quantized-KV routes will be added after fresh validation.
  • Restores the Qwen/Gemma feature matrix to the v0.1.3 capability-view wording while preserving the 0.1.4 naming cleanup: TurboQuant KV and 256K/512K labels.
  • Carries forward the v0.1.3 graph-safety runtime fixes while removing upstream CI/docs/test bulk from the public source tree.

v0.1.3 - MTP Graph Safety

05 Jun 14:47
41f231a

Choose a tag to compare

v0.1.3 - MTP Graph Safety

  • Adds graph-safety handling for Native MTP + hybrid Mamba/GDN models.
  • Production profiles now fall back from full decode CUDA Graph replay to PIECEWISE/NONE for this risky combination.
  • Keeps the old peak-throughput path available for explicit benchmark profiles via VLLM_ALLOW_MAMBA_SPEC_FULL_CUDAGRAPH=1.
  • Updates VERSION, CHANGELOG.md, and README release markers to v0.1.3.