-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: When tensor_parallel_size>1, RuntimeError: Cannot re-initialize CUDA in forked subprocess. #6152
Comments
please paste your full code. you might initialized cuda before using vLLM. |
I'm also having the same issue with the latest version of vllm + gemma-2-27B-it |
|
I can run the following code without any issues: from vllm import LLM, SamplingParams
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model="google/gemma-2-27b-it", tensor_parallel_size=2)
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") |
It works for me. Thank you~ |
i'm not sure. maybe it's about the version issue of powerinfer? and I actually find that even I used the above solution to make vllm able to generate, the quality is not as good as other Gemma-2-27B inference (both under greedy decoding). |
it also works for me! thank you! |
I have encountered the same issue. I solved it by making |
|
@rin2401 try to use |
if you're using this in a Python notebook run the following first on a reset kernel:
|
Your current environment
vllm version: '0.5.0.post1'
🐛 Describe the bug
When I set tensor_parallel_size=1, it works well.
But, if I set tensor_parallel_size>1, below error occurs:
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
After I add
the same RuntimeError still occurs.
The text was updated successfully, but these errors were encountered: