Description
When mlx-stack bench <hf-repo> starts a temporary vllm-mlx instance that fails to become healthy, the PID file is not cleaned up. This leaves stale PID files in ~/.mlx-stack/pids/.
Steps to Reproduce
# Use an invalid/nonexistent HF repo
mlx-stack bench mlx-community/nonexistent-model
# Check for stale PID file
ls ~/.mlx-stack/pids/bench-temp-*
# bench-temp-mlx-community--nonexistent-model.pid exists with dead PID
Expected Behavior
The _cleanup_temp_instance() function should remove the PID file when the temporary instance fails health checks or crashes during startup.
Actual Behavior
The PID file persists after the process dies. The cleanup code path appears to be called but does not fully clean up the PID file.
Additional Finding: bench standard fails with timeout
Running mlx-stack bench standard (gemma-3-4b-it-qat) fails with:
Benchmark error: HTTP error during benchmark: peer closed connection without
sending complete message body (incomplete chunked read)
This appears to be a timeout issue when benchmarking larger models with the default 1024-token prompt. The vllm-mlx server may be closing the connection before the benchmark completes.
Impact
Medium — stale PID files can accumulate and confuse the process management system.
Description
When
mlx-stack bench <hf-repo>starts a temporary vllm-mlx instance that fails to become healthy, the PID file is not cleaned up. This leaves stale PID files in~/.mlx-stack/pids/.Steps to Reproduce
Expected Behavior
The
_cleanup_temp_instance()function should remove the PID file when the temporary instance fails health checks or crashes during startup.Actual Behavior
The PID file persists after the process dies. The cleanup code path appears to be called but does not fully clean up the PID file.
Additional Finding:
bench standardfails with timeoutRunning
mlx-stack bench standard(gemma-3-4b-it-qat) fails with:This appears to be a timeout issue when benchmarking larger models with the default 1024-token prompt. The vllm-mlx server may be closing the connection before the benchmark completes.
Impact
Medium — stale PID files can accumulate and confuse the process management system.