You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adds explicit W8A8 checkpoint support documentation for the Quark INT8 route, including the tested nameistoken/Qwen3.6-27B-Quark-W8A8-INT8 checkpoint.
Updates the launcher display to separate the real vLLM --quantization value from the display-only W/A type (W4A16, W8A16, W8A8).
Adds launcher service-status cache reporting for running services. Live used values and 30-second refresh are shown only when vLLM exposes real cache-usage metrics; otherwise the launcher reports total cache capacity only.
Improves launcher startup preflight for large single-file checkpoints and cleans up residual vLLM processes after failed launches.
Fixes launcher stop handling so orphaned vLLM API servers, worker processes, and resource trackers are discovered and cleaned up instead of leaving VRAM occupied.