Skip to content

v0.1.6

Choose a tag to compare

@weicj weicj released this 09 Jun 12:47
  • Adds explicit W8A8 checkpoint support documentation for the Quark INT8 route, including the tested nameistoken/Qwen3.6-27B-Quark-W8A8-INT8 checkpoint.
  • Updates the launcher display to separate the real vLLM --quantization value from the display-only W/A type (W4A16, W8A16, W8A8).
  • Adds launcher service-status cache reporting for running services. Live used values and 30-second refresh are shown only when vLLM exposes real cache-usage metrics; otherwise the launcher reports total cache capacity only.
  • Improves launcher startup preflight for large single-file checkpoints and cleans up residual vLLM processes after failed launches.
  • Fixes launcher stop handling so orphaned vLLM API servers, worker processes, and resource trackers are discovered and cleaned up instead of leaving VRAM occupied.