Low-Memory Linux-Friendly LLM & Embedding API Suggestion

We need LLM and embedded model API more, and most of the Linux environment cannot support GPU with large memory, so it is difficult to deploy/include embedded models locally through vllm. Thank you. Very good product, promising future.
我们更需要 LLM 和嵌入式模型 API，而且大多数 Linux 环境无法支持大内存 GPU，因此通过 vllm 在本地部署/包含嵌入式模型比较困难成本较高。谢谢，很好的产品，未来可期