Releases · weicj/vLLM-2080Ti-Definitive

09 Jun 12:47

weicj

v0.1.6

7948e09

v0.1.6 Latest

Latest

Adds explicit W8A8 checkpoint support documentation for the Quark INT8 route,
including the tested nameistoken/Qwen3.6-27B-Quark-W8A8-INT8 checkpoint.
Updates the launcher display to separate the real vLLM --quantization
value from the display-only W/A type (W4A16, W8A16, W8A8).
Adds launcher service-status cache reporting for running services. Live
used values and 30-second refresh are shown only when vLLM exposes real
cache-usage metrics; otherwise the launcher reports total cache capacity only.
Improves launcher startup preflight for large single-file checkpoints and
cleans up residual vLLM processes after failed launches.
Fixes launcher stop handling so orphaned vLLM API servers, worker processes,
and resource trackers are discovered and cleaned up instead of leaving VRAM
occupied.

Assets 2

08 Jun 06:36

weicj

v0.1.5

00c98e9

v0.1.5 - Launcher Profiles

This release updates the public runtime package around the new launcher and profile layout.

Highlights:

Renames the public service manager to launcher.sh and keeps build.sh as the one-click source build entry point.
Updates launcher modes to safe, normal, and fast, with Qwen3.6 profiles organized by model, mode, and weight precision.
Adds chat template presets and service-level thinking budget defaults while keeping global runtime controls out of route profile files.
Refreshes Qwen3.6 profile documentation and restores the KV throughput sweep SVG charts.

Validation:

bash -n build.sh launcher.sh tools/validate_profiles.sh tools/evaluate_fast_modes.sh
bash tools/validate_profiles.sh (profile_validation_ok total=11)
Python compile checks for touched runtime files
launcher dry-runs for safe and fast graph policy behavior

Assets 2

06 Jun 05:06

weicj

v0.1.4

ba44cb1

v0.1.4 - Slim Runtime Source Release

vLLM 2080 Ti Definitive Edition v0.1.4

This release trims the public source tree to the focused SM75 runtime: source tree, one-click build entry point, interactive launcher, validated profiles, and project documentation.

Changes:

Slims the public repository to the focused SM75 runtime source tree, launcher scripts, validated profiles, and project documentation.
Adds the interactive start.sh service manager and one-click build.sh source build entry point.
Keeps Docker artifacts out of this source release; Docker packaging remains a separate future deployment layer.
Keeps active profiles in profiles/ and experimental snippets under profiles/experimental/.
Keeps docs/model-profile-routes.md limited to tested stable FP16 routes; speed-mode and quantized-KV routes will be added after fresh validation.
Restores the Qwen/Gemma feature matrix to the v0.1.3 capability-view wording while preserving the 0.1.4 naming cleanup: TurboQuant KV and 256K/512K labels.
Carries forward the v0.1.3 graph-safety runtime fixes while removing upstream CI/docs/test bulk from the public source tree.

Assets 2

05 Jun 14:47

weicj

v0.1.3

41f231a

v0.1.3 - MTP Graph Safety

Adds graph-safety handling for Native MTP + hybrid Mamba/GDN models.
Production profiles now fall back from full decode CUDA Graph replay to PIECEWISE/NONE for this risky combination.
Keeps the old peak-throughput path available for explicit benchmark profiles via VLLM_ALLOW_MAMBA_SPEC_FULL_CUDAGRAPH=1.
Updates VERSION, CHANGELOG.md, and README release markers to v0.1.3.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v0.1.3 - MTP Graph Safety

Uh oh!

Releases: weicj/vLLM-2080Ti-Definitive

v0.1.6

Uh oh!

v0.1.5 - Launcher Profiles

Uh oh!

v0.1.4 - Slim Runtime Source Release

Uh oh!

v0.1.3 - MTP Graph Safety

v0.1.3 - MTP Graph Safety

Uh oh!