Skip to content

bug: bench leaves stale PID file when temporary instance fails to start #44

@weklund

Description

@weklund

Description

When mlx-stack bench <hf-repo> starts a temporary vllm-mlx instance that fails to become healthy, the PID file is not cleaned up. This leaves stale PID files in ~/.mlx-stack/pids/.

Steps to Reproduce

# Use an invalid/nonexistent HF repo
mlx-stack bench mlx-community/nonexistent-model

# Check for stale PID file
ls ~/.mlx-stack/pids/bench-temp-*
# bench-temp-mlx-community--nonexistent-model.pid exists with dead PID

Expected Behavior

The _cleanup_temp_instance() function should remove the PID file when the temporary instance fails health checks or crashes during startup.

Actual Behavior

The PID file persists after the process dies. The cleanup code path appears to be called but does not fully clean up the PID file.

Additional Finding: bench standard fails with timeout

Running mlx-stack bench standard (gemma-3-4b-it-qat) fails with:

Benchmark error: HTTP error during benchmark: peer closed connection without 
sending complete message body (incomplete chunked read)

This appears to be a timeout issue when benchmarking larger models with the default 1024-token prompt. The vllm-mlx server may be closing the connection before the benchmark completes.

Impact

Medium — stale PID files can accumulate and confuse the process management system.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions