# vLLM 介绍

https://github.com/vllm-project/vllm
https://docs.vllm.ai/


vLLM（Very Large Language Model）是一种面向大型语言模型的技术，它通常指代那些具有数百亿甚至更多参数的深度学习模型。
vLLM是近年来人工智能领域的一个重要进展，通常被用来进行文本生成、语言理解、机器翻译等多种自然语言处理任务。

vLLM的一些特点包括：

- 大规模参数：vLLM模型通常拥有非常庞大的参数量，这使得它们在处理复杂任务时能够展现出更高的性能。
- 资源需求：由于其庞大的规模，训练和推理过程需要大量的计算资源和内存，因此对于硬件的要求非常高。
- 通用性：vLLM可以应用于多种自然语言处理场景，帮助解决从文本生成到情感分析等问题。
- 效率优化：为了让这些模型能够更高效地运行，一些优化技术如模型压缩、量化等被用来减少计算资源的消耗。



# 安装

下载源码

```bash
git clone https://github.com/vllm-project/vllm.git
cd vllm
```


## conda environment



```bash
conda create -n vllm python=3.12 -y
conda activate vllm

conda deactivate
```

## uv environment


```bash
uv venv vllm --python 3.12 --seed
source vllm/bin/activate

```

## 命令行安装

```bash
uv pip install vllm  # uv
pip install vllm   # conda
```

## 源码安装


```bash
pip install -r requirements-cpu.txt
pip install -e . 
```

Note: On macOS the VLLM_TARGET_DEVICE is automatically set to cpu, which currently is the only supported device.


# SDK 基本使用  





huggingface-cli download  facebook/opt-125m --local-dir opt-125m  
huggingface-cli download  deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --local-dir DeepSeek-R1-Distill-Qwen-1.5B  





In [None]:
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer  # 导入 transformers 库，用于加载 tokenizer

# 指定下载路径
cache_dir = basePath + '/models/cache'
tokenizer = None #  初始化 tokenizer 为 None

# Sample prompts.
prompts_v01 = ["先有鸡还是先有蛋?<think>\n", ]  # 定义输入文本列表这里只有一个提示语
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
# 创建 SamplingParams 对象，用于设置生成文本的参数
# temperature: 控制生成文本的随机性值越高，随机性越高
# top_p:  nucleus sampling 的参数，用于控制生成文本的多样性
# max_tokens: 生成文本的最大 token 数量
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=8192)

# huggingface-cli download  deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --local-dir DeepSeek-R1-Distill-Qwen-1.5B
model = basePath + '/models/DeepSeek-R1-Distill-Qwen-1.5B/'  # 指定模型名称或指定模型路径这里指定的是模型路径

# Create an LLM.
# model: 模型路径或名称
# tokenizer: tokenizer 对象，如果为 None，vllm 会尝试自动加载
# max_model_len: 模型上下文长度的最大值
# dtype: 模型权重和激活的数据类型 "half" 表示使用 FP16 精度，可以减少内存占用，加快推理速度
# gpu_memory_utilization: vllm 引擎使用的 GPU 内存比例0.7 表示使用 70% 的 GPU 内存
# trust_remote_code: 是否信任远程代码对于某些模型，需要设置为 True
llm = LLM(model=model, tokenizer=tokenizer, max_model_len=8192, dtype='half', gpu_memory_utilization=0.7, trust_remote_code=True)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts_v01, sampling_params)
# Print the outputs with enhanced formatting for clarity.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    length_of_generated_text = len(generated_text.split())
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}, Length of generated text: {length_of_generated_text} words")


ModuleNotFoundError: No module named 'vllm'


## recvValue failed on SocketImpl


```text
[W303 17:13:18.315241000 TCPStore.cpp:141] [c10d] recvValue failed on SocketImpl(fd=83, addr=[::ffff:192.168.255.10]:65450, remote=[::ffff:192.168.255.10]:65449): failed to recv, got 0 bytes
Exception raised from recvBytes at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/distributed/c10d/Utils.hpp:670 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 52 (0x1079469ec in libc10.dylib)
frame #1: void c10d::tcputil::recvBytes<unsigned int>(int, unsigned int*, unsigned long) + 352 (0x30f4a99c0 in libtorch_cpu.dylib)
frame #2: unsigned int c10d::detail::TCPClient::receiveValue<unsigned int>() + 40 (0x30f4a9704 in libtorch_cpu.dylib)
frame #3: c10d::TCPStore::ping() + 196 (0x30f4a82c8 in libtorch_cpu.dylib)
frame #4: c10d::TCPStore::TCPStore(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, c10d::TCPStoreOptions const&) + 1232 (0x30f4a7158 in libtorch_cpu.dylib)
frame #5: c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>> c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>>::make<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, c10d::TCPStoreOptions&>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, c10d::TCPStoreOptions&) + 72 (0x119fdb1b0 in libtorch_python.dylib)
frame #6: std::__1::enable_if<std::is_void<pybind11::class_<c10d::TCPStore, c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>>>>::value, pybind11::detail::void_type>::type pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned short, std::__1::optional<int>, bool, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l>>, bool, bool, std::__1::optional<int>, bool>::call<void, pybind11::gil_scoped_release, void pybind11::detail::initimpl::factory<torch::distributed::c10d::(anonymous namespace)::c10d_init(_object*, _object*)::$_37, pybind11::detail::void_type (*)(), c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>> (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned short, std::__1::optional<int>, bool, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l>>, bool, bool, std::__1::optional<int>, bool), pybind11::detail::void_type ()>::execute<pybind11::class_<c10d::TCPStore, c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>>>, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release>>(pybind11::class_<c10d::TCPStore, c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>>>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&) &&::'lambda'(pybind11::detail::value_and_holder&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned short, std::__1::optional<int>, bool, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l>>, bool, bool, std::__1::optional<int>, bool)&>(void pybind11::detail::initimpl::factory<torch::distributed::c10d::(anonymous namespace)::c10d_init(_object*, _object*)::$_37, pybind11::detail::void_type (*)(), c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>> (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned short, std::__1::optional<int>, bool, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l>>, bool, bool, std::__1::optional<int>, bool), pybind11::detail::void_type ()>::execute<pybind11::class_<c10d::TCPStore, c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>>>, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release>>(pybind11::class_<c10d::TCPStore, c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>>>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&) &&::'lambda'(pybind11::detail::value_and_holder&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned short, std::__1::optional<int>, bool, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l>>, bool, bool, std::__1::optional<int>, bool)&) && + 164 (0x119fdad18 in libtorch_python.dylib)
frame #7: void pybind11::cpp_function::initialize<void pybind11::detail::initimpl::factory<torch::distributed::c10d::(anonymous namespace)::c10d_init(_object*, _object*)::$_37, pybind11::detail::void_type (*)(), c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>> (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned short, std::__1::optional<int>, bool, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l>>, bool, bool, std::__1::optional<int>, bool), pybind11::detail::void_type ()>::execute<pybind11::class_<c10d::TCPStore, c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>>>, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release>>(pybind11::class_<c10d::TCPStore, c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>>>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&) &&::'lambda'(pybind11::detail::value_and_holder&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned short, std::__1::optional<int>, bool, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l>>, bool, bool, std::__1::optional<int>, bool), void, pybind11::detail::value_and_holder&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned short, std::__1::optional<int>, bool, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l>>, bool, bool, std::__1::optional<int>, bool, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release>>(pybind11::class_<c10d::TCPStore, c10::intrusive_ptr<c10d::TCPStore, c10::detail::intrusive_target_default_null_type<c10d::TCPStore>>>&&, void (*)(pybind11::detail::value_and_holder&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned short, std::__1::optional<int>, bool, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l>>, bool, bool, std::__1::optional<int>, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) + 92 (0x119fdabd4 in libtorch_python.dylib)
frame #8: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3800 (0x119594694 in libtorch_python.dylib)
frame #9: cfunction_call + 72 (0x10229d738 in python3.12)
frame #10: _PyObject_MakeTpCall + 376 (0x102234ffc in python3.12)
frame #11: method_vectorcall + 940 (0x10223adbc in python3.12)
frame #12: _PyObject_Call + 356 (0x102235fd4 in python3.12)
frame #13: slot_tp_init + 1528 (0x1022cddbc in python3.12)
frame #14: type_call + 148 (0x1022c1b58 in python3.12)
frame #15: pybind11_meta_call + 40 (0x119590840 in libtorch_python.dylib)
frame #16: _PyEval_EvalFrameDefault + 224588 (0x1023a6e68 in python3.12)
frame #17: gen_iternext + 144 (0x1022537cc in python3.12)
frame #18: builtin_next + 76 (0x10236ad80 in python3.12)
frame #19: cfunction_vectorcall_FASTCALL + 96 (0x10229e378 in python3.12)
frame #20: _PyEval_EvalFrameDefault + 223660 (0x1023a6ac8 in python3.12)
frame #21: method_vectorcall + 368 (0x10223ab80 in python3.12)
frame #22: _PyEval_EvalFrameDefault + 241916 (0x1023ab218 in python3.12)
...
frame #57: pymain_main + 552 (0x102449e70 in python3.12)
frame #58: main + 56 (0x1021c48e4 in python3.12)
frame #59: start + 2360 (0x1984020e0 in dyld)
```
