LightWeight is a high-performance, private-first CLI tool designed to run massive LLMs (like Llama-3.1 70B or Qwen-32B) on everyday consumer laptops without overheating or crashing.
- iMatrix Quantization: Preserves ~95% of model intelligence even at extreme compression.
- Thermal Shield: Intelligently caps CPU threads to keep your laptop cool and silent.
- Smart Memory Manager: Real-time RAM monitoring that prevents "Out of Memory" crashes by auto-compacting the KV-cache.
- Bulletproof Portability: A single self-contained binary that works on any Windows/Linux/Mac without extra drivers.
irm https://lightweight.zecoryx.uz/install.ps1 | iexcurl -sSf https://lightweight.zecoryx.uz/install.sh | shAnalyze if your laptop can handle a model before you spend time downloading it.
lightweight check llama3:70bAutomatically finds the smartest (iMatrix) and most efficient version for your RAM.
lightweight pull qwen:32bStart an optimized, private conversation. Use @filename to inject code/file context.
lightweight chat qwen:32bHost an OpenAI-compatible API to use with Cursor, VS Code, or other devices.
lightweight serve --port 8000See exactly how much space your AI library is taking.
lightweight storageCurrently archived in the /futures directory:
- Multimodal Support: Native compressed Stable Diffusion and Whisper integration.
- Text-to-Video: SVD (Stable Video Diffusion) optimizations.
- Speculative Decoding: Blazing fast inference using tiny draft models.
MIT © LightWeight Team